MLOps for Automated Training, Evaluation, Deployment and Monitoring

3 min readDec 9, 2021

This is Part I of the series illustrating how I automated the training, evaluation, and deployment of machine learning models for one of my clients and developed a fully functional framework around Kubeflow and MLOps

Part I - Here I will take you through the problem statement, need of MLOps, and 10000-foot view of the project goal and the Model lifecycle

Part II - This one focuses on the architecture, high-level design, and services involved to realize the solution

Part III - Covers Kubeflow Pipelines - Training & Validation Pipeline, Evaluation of Pre-Trained models, Deployment of a trained model

Part IV - Orchestrator for Kubeflow Pipelines, Authentication(SSO) setup with Kubeflow, Istio, Dex, and Cognito

Part V - Dataset versioning using DVC, Using EFS for sharing datasets across stages

Part VI - Automated model deployment using Kserve/Helmcharts for serving models and packaging models on the fly

Part VII - Carbon Footprint/Emissions monitoring with Kubernetes, Newrelic, and using MLFlow for tracking model training params, metrics

Problem Statement

Machine Learning (ML) models built by data scientists represent a small fraction of the components that comprise an enterprise production deployment workflow. To operationalize ML models, data scientists are required to work closely with multiple other teams such as business, engineering, and operations. This represents organizational challenges in terms of communication, collaboration, and coordination.

*Fig 1: Only a small fraction of real-world ML systems are composed of the ML code, as shown by the small box in the middle. The required surrounding infrastructure is vast and complex.*

MLOps

MLOps, or machine learning operations, are the infrastructure, methods, and practices used to streamline the machine learning life cycle from model design and data preparation to monitoring of model in production.

MLOps took many principles from DevOps. Both DevOps and MLOps encourage and facilitate collaboration between people who develop (software engineers and data scientists), people who manage infrastructure, and other stakeholders. Both emphasize process automation in continuous development so that speed and efficiency are maximized.

Unlike traditional Software development, Machine Learning is not only influenced by changes in its code. Data is the other critical input that’ll need to be managed, as will parameters, metadata, logs, and finally, the model.

In addition to solving an organizational challenge, MLOps is addressing a technical one: how to make the process reproducible and auditable. By using similar tools and by following common workflows, teams can more easily automate the delivery of an end-to-end ML system ensuring its reproducibility. This, in turn, increases the trust in the ML model predictions and facilitate its audit.

I don’t intend to describe MLOps in depth. There are plenty of resources on the web. Some references are -

Continuous Delivery for Machine Learning
Introduction to MLOps by O’Reilly media.
https://martinfowler.com/articles/cd4ml.html

You can also aim for an online specialization course: Machine Learning Engineering for Production (MLOps) Specialization

Project Mission

At the risk of oversimplification, the mission of my project was to

Take the workload away from data scientists to operationalize Machine Learning (automatic model training, deployment, etc.)
Establish common practices with reproducibility of model training and robustness by design
Automate most of the infrastructure and builds required during the Model Development lifecycle

Use case Overview

As with any project, data scientist teams work to accomplish a task e.g. Extract authors from research papers, find references, citations from manuscripts, etc. For any task, there are multiple model architectures to be evaluated, and the most suited model based on objectives and metrics is deployed later.

The figure above illustrates the lifecycle of model development, deployment, and maintenance. It is the responsibility of MLOps to facilitate the Tools, Pipelines, and underlying infrastructure in order to make the End-to-End process seamless.

If this gets you intrigued, read through Part II that focuses on the architecture, high-level design, and services involved to realize the solution

MLOps for Automated Training, Evaluation, Deployment and Monitoring

Problem Statement

MLOps

Project Mission

Use case Overview

Written by Bhagat Khemchandani

Responses (2)