MLOps — Part IV Pipelines orchestrator and SSO setup

Bhagat Khemchandani
3 min readFeb 5, 2022

--

In this part, let's explore some ways of smoothly integrating Kubeflow pipelines with your existing systems.

Orchestrator

You might be wondering why do we need this when Kubeflow is an orchestrator itself.
Based on some challenges discussed in Part I and Part II, we had set the following goals to define the success of this solution.

Firstly, data scientists should not need to learn about Kubeflow or develop pipelines themselves, their focus should be on Model development.

Next, Machine learning pipelines/ workflows should be standardized to ensure consistency amongst the teams in terms of data preparation, model training, evaluation, deployment.

Lastly, while Kubeflow is a tool of choice at present, in the future we might want to switch to a better alternative (should there be one) or even go hybrid with multiple tools.

It is because of the above goals that an orchestrator is put in place. The purpose of this orchestrator is to expose APIs to trigger different workflows and

APIs like /train, /deploy, and /evaluate have been defined which accept parameters like a git repository, gitsha, dataset version, env variables, train and test split which are then forwarded to kubeflow

All the kubeflow specific implementations are abstracted out by this service. It uses kfp-client to interact with the cluster. It goes without saying that a cookie is required to be passed with the kubeflow API requests. Take a look at Logging in to Kubeflow to get the cookie programmatically.

We also use MLFlow as our tracking server, so it is the responsibility of this orchestrator to log workflow parameters on MLFlow.

SSO setup

Most organizations use LDAP servers to handle authentication/membership. My organization is no different and uses AzureAD for this purpose.

In order to leverage the existing setup, we decided to integrate AzureAD with AWS Cognito for user authentication and authorization.

AWS Cognito scales to millions of users and supports sign-in with social identity providers, such as Facebook, Google, and Amazon, and enterprise identity providers via SAML 2.0.

Kubeflow uses Dex as a federated OpenID connection provider and can be integrated with Cognito to provide authentication and identity services.\

Here is the OIDC Flow involving Kubeflow, Dex, Istio gateway and the user.

In order to authenticate the kubeflow users with AzureAD. I have -

  1. Created Cognito User Pool
  2. Created Azure AD User Group to handle group membership.
  3. Set up SAML Provider between Cognito and AzureAD
  4. Setup AWS ALB (pointing to istio-gateway service) as an HTTPS endpoint, as Cognito supports HTTPS redirect_urls only
  5. Created App Client in Cognito
  6. Patched kubeflow-Dex to act as Cognito Client using Cognito connector
  7. Patched kubeflow oidc-authservice with the new dex url.

Domain isolation using Namespaces/Profiles

For Isolation Kubernetes uses Namespaces. A Profile is a unique configuration for a user, which determines their access privileges and is defined by the Administrator.

Kubeflow multi-user isolation is configured by Kubeflow administrators. Administrators configure Kubeflow User Profiles/Bindings for each user. After the configuration is created and applied, a user can only access the Kubeflow components that the Administrator has configured for them. The configuration limits unauthorized users from viewing or accidentally deleting artifacts.

As mentioned earlier, the purpose of this series is not to go into too much detail, but to give you an overall idea of how you can design a working solution by using some out-of-the-box tools along with some in-house engineering.
Feel free to post any questions/suggestions in the comments section.

If you want to learn more about the Login workflow in kubeflow refer -

i.) https://www.arrikto.com/blog/kubeflow/news/kubeflow-authentication-with-istio-dex/

ii.) https://github.com/ajmyyra/ambassador-auth-oidc

Continue reading Part V to learn about Dataset versioning using DVC and EFS in Kubeflow Pipelines

--

--