ML Model Serving with Kubernetes and KFServing

Question

Pulumi · Accepted Answer

To serve a machine learning model on Kubernetes using KFServing, you'll need a Kubernetes cluster with KFServing installed, as well as a containerized machine learning model that you can deploy. Pulumi's infrastructure as code approach can be used to set up the entire workflow, from provisioning a Kubernetes cluster to deploying a KFServing InferenceService.

KFServing, part of Kubeflow, is a serverless framework to deploy machine learning models in a Kubernetes environment. It offers various features such as auto-scaling, canary rollouts, and serverless capabilities.

Below I'll provide a brief explanation and then a Pulumi program that sets up a Kubernetes cluster, installs KFServing, and deploys a simple InferenceService with a prebuilt model.

First, we create a Kubernetes cluster using Pulumi's `pulumi_eks` module which simplifies the setup of an Amazon EKS cluster. After the cluster is provisioned, we install KFServing using the `pulumi_kubernetes` provider—this involves applying the necessary Kubernetes manifests.

Finally, we define an `InferenceService` resource that tells KFServing how to serve the model. For this example, we'll use a prebuilt image that serves a simple sklearn model, but in a production environment, you would replace this with your own model's image.

```python
import pulumi
import pulumi_eks as eks
import pulumi_kubernetes as k8s

# Create an EKS cluster
eks_cluster = eks.Cluster('my-eks-cluster')

# Use the kubeconfig of the generated EKS cluster to interact with the cluster
kubeconfig = eks_cluster.kubeconfig.apply(lambda kc: kc)

# Set up the Kubernetes provider using the kubeconfig from the EKS cluster
k8s_provider = k8s.Provider('k8s-provider', kubeconfig=kubeconfig)

# Install KFServing (Knative and Cert Manager are prerequisites)
# The manifests are examples, and should be replaced by actual urls to KFServing YAML files
# usually obtained from the official KFServing GitHub repository.
kfserving_namespace = k8s.core.v1.Namespace('kfserving-namespace',
    metadata={'name': 'kfserving-system'},
    opts=pulumi.ResourceOptions(provider=k8s_provider))

cert_manager_yaml = k8s.yaml.ConfigFile('cert-manager',
    file='https://github.com/jetstack/cert-manager/releases/download/v1.0.4/cert-manager.yaml',
    opts=pulumi.ResourceOptions(provider=k8s_provider, depends_on=[kfserving_namespace]))

knative_serving_yaml = k8s.yaml.ConfigFile('knative-serving',
    file='https://github.com/knative/serving/releases/download/v0.18.0/serving-crds.yaml',
    opts=pulumi.ResourceOptions(provider=k8s_provider, depends_on=[cert_manager_yaml]))

kfserving_yaml = k8s.yaml.ConfigFile('kfserving',
    file='https://github.com/kubeflow/kfserving/releases/download/v0.5.0/kfserving.yaml',
    opts=pulumi.ResourceOptions(provider=k8s_provider, depends_on=[knative_serving_yaml]))

# Define an InferenceService using a prebuilt sklearn model image
sklearn_inference_service = k8s.yaml.ConfigGroup(
    'sklearn-inferenceservice',
    files=['./sklearn-inferenceservice.yaml'],
    opts=pulumi.ResourceOptions(provider=k8s_provider, depends_on=[kfserving_yaml])
)

# Export the cluster's kubeconfig
pulumi.export('kubeconfig', kubeconfig)
```

Please replace `'./sklearn-inferenceservice.yaml'` with the actual path to the InferenceService manifest file, it should look something like this:

```yaml
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
  namespace: "kfserving-system"
spec:
  default:
    predictor:
      sklearn:
        storageUri: "gs://kfserving-samples/models/sklearn/iris"
```

In this InferenceService manifest, `storageUri` points to a Google Cloud Storage bucket containing the trained machine learning model. KFServing will pull this model and serve it. You will need to change this URI to point to your model's storage location.

Make sure you have access to a Kubernetes cluster and have set up your Pulumi credentials for both AWS and Kubernetes before running this program. Your machine learning model should be containerized and pushed to a container registry where Kubernetes can pull it from.

After running this program with Pulumi, you should have an EKS cluster running with KFServing installed, ready to serve your machine learning model. You can interact with the KFServing InferenceService using Kubernetes tools like `kubectl` to send inference requests and receive predictions.