MLOps - Part III Kubeflow Pipelines

Bhagat Khemchandani
4 min readDec 28, 2021

--

A pipeline is a description of a machine learning (ML) workflow, including all of the components in the workflow and how the components relate to each other in the form of a graph.

In this Part III of the series I am going to walk you through the components we need for an ML pipeline, how we standardized them and designed reusable workflows for multiple teams.

Use cases

  1. Training/Experimentation with training script, input dataset, and fixed hyperparameters
  2. Deployment of a model, updating an existing deployment with a newer model
  3. Evaluation/Validation of model(s) trained in above use case(1), or a pre-trained model
  4. Hyperparameter Tuning
  5. Incremental Learning
  6. Model Monitoring

Let us take a look into some of these use cases and understand the components involved.

First use case - Model Training

Pipeline for model training

Training steps (as seen in the figure )

  • Provision training datasets and training script
  • Provisioning the infrastructure with desired configurations, e.g. env. variables, hyperparameters
  • Train the model and log metrics, params to a tracking server e.g. MLFlow

As you can see there are three components that may vary across experiments -
- Input Datasets
- Training script
- Hyperparameters

In order to make the training workflows reproducible and consistent across runs, we needed to lock in on the above components. So, we decided to version control the datasets using DVC instead of the usual S3/File systems, whereas training scripts and hyperparameters are versioned using Git.

At the same time, we do not want Data scientists to provision the datasets, code themselves. Hence it is imperative to automate the data and code provisioning stages in the pipeline itself.

Second use case — Model Deployment

Seamless model deployment is really crucial for fast feedback and this is a stage that has nothing to do with a Data scientist. Simply put, model deployment is a stage where the model binary has to be loaded in memory by a supporting framework, with desired libraries installed and exposed as an API, all this without the need for manual efforts.

Typically a model deployment needs -

  1. Model(s) files — Model serialised to .bin, .pkl or .zip files and stored on file system/S3 etc..
  2. Wrapper — The code to load the model in memory and invoke its predict functions by providing data in the format it expects
  3. Additional Modules — These modules/python files are used by wrapper code, for eg. manipulate dataframe, load model, and other utilities
  4. Packages — These are standard libraries used by the wrapper. E.g. transformers, torch, NumPy, etc

We have designed the deployment pipeline in such a way that the above four components are picked up automatically and baked into an executable artifact, which is then deployed as a REST API to be consumed by other services.

Model Deployment Pipeline
  1. Clone the repository containing deployment descriptor and the above-mentioned artifacts
  2. Assembly — This stage assembles the wrapper, modules, python packages, and the trained model in a deployment package
  3. Build Model Image — This stage picks up the deployment package from above, stitches everything together, and bakes a docker image that can be run anywhere, and exposes standard rest endpoints to trigger inference on the model.
  4. Deploy Model — This stage uses Kubeflow KFServe to deploy the model docker image as an inference service and exposes the model as a service over the Load Balancer automatically

More use cases - I prefer not to make this an exhaustive story so the remaining use cases will have to be detailed in a separate one.

Pipelines are generic
The idea is to define generic, configurable, and reusable pipelines and have them run by data scientists by providing different parameters e.g. Training repo GitSHA, dataset GitSHA for DVC checkout, train dataset, test dataset hyperparameters e.t.c.

Also, each of the stages in this pipeline is run by Kubernetes pods, and docker images specific to each stage have been created. For example, a docker image to clone datasets from DVC, another one to run the training script e.t.c

While I understand this is too much information in too little content, I feel it is important to give an idea about the overall ecosystem and not go into too much depth.

Feel free to provide feedback/ask questions in the comments below, and help me make this more informative.

In the next part, I will take you through the Orchestrator for Kubeflow Pipelines, Authentication(SSO) setup for Kubeflow, and bring together Istio, Dex, and Cognito.

--

--