Model Serving Pipelines with Kubernetes and Tensorflow Serving

Question

Pulumi · Accepted Answer

Creating a model serving pipeline using Kubernetes and TensorFlow Serving involves several steps, such as provisioning a Kubernetes cluster, preparing a Docker image with TensorFlow Serving, setting up the Kubernetes Services and Deployments, and finally deploying your machine learning model for predictions.

Below, I'll provide you with a Pulumi program written in Python that will set up the necessary Kubernetes resources to serve a TensorFlow model. This setup involves creating a deployment which includes a TensorFlow Serving container. The model files would be kept in a location that TensorFlow Serving can access. We will also create a Kubernetes service to expose the deployment to network traffic.

The Pulumi program will assume that you have a Kubernetes cluster up and running and that your `kubectl` command-line tool is configured to communicate with the cluster. It will use the Pulumi Kubernetes provider to create resources in the cluster.

First, a high-level explanation of what each part of the Pulumi program does:

1. **Import Statements**: Import the Pulumi Kubernetes package which provides the classes and functions to interact with Kubernetes resources.

2. **Model Data**: Identify the location of your TensorFlow model data. This would typically be a path to a volume or a URI from where TensorFlow Serving can load the model.

3. **Deployment**: Create a Kubernetes `Deployment` using the `pulumi_kubernetes.apps.v1.Deployment` class which will pull the TensorFlow Serving Docker image, and use the model location to serve the model.

4. **Service**: Expose the TensorFlow Serving pod with a Kubernetes `Service` using the `pulumi_kubernetes.core.v1.Service` class. This service will forward requests to the TensorFlow Serving pod.

5. **Exports**: Export any information about the deployment that you might need, such as the public IP address of the Service.

Here's the Pulumi program:

```python
import pulumi
import pulumi_kubernetes as k8s

# Name of the deployment
deployment_name = 'tf-serving-deployment'

# The Docker image for TensorFlow Serving
# Replace this with the version you wish to use or your own custom image
tf_serving_image = 'tensorflow/serving:latest'

# The port that TensorFlow Serving listens on
tf_serving_port = 8501

# TensorFlow Model Server port (container port)
container_port = 8501

# The location of the model data
model_data_path = "/models/mymodel"  # You'll need to update this with the location of your model data

# Define the Kubernetes Deployment for TensorFlow Serving
tf_deployment = k8s.apps.v1.Deployment(
    deployment_name,
    spec=k8s.apps.v1.DeploymentSpecArgs(
        replicas=1,
        selector=k8s.meta.v1.LabelSelectorArgs(
            match_labels={"app": deployment_name}
        ),
        template=k8s.core.v1.PodTemplateSpecArgs(
            metadata=k8s.meta.v1.ObjectMetaArgs(labels={"app": deployment_name}),
            spec=k8s.core.v1.PodSpecArgs(
                containers=[k8s.core.v1.ContainerArgs(
                    name=deployment_name,
                    image=tf_serving_image,
                    ports=[k8s.core.v1.ContainerPortArgs(container_port=container_port)],
                    args=[
                        f"--model_name=mymodel",  # Name of the model
                        f"--model_base_path={model_data_path}" # Path to the model data
                    ],
                )]
            ),
        ),
    ))

# Define a Kubernetes Service to expose the TensorFlow Serving Deployment
tf_service = k8s.core.v1.Service(
    'tf-serving-service',
    spec=k8s.core.v1.ServiceSpecArgs(
        type="LoadBalancer",
        selector={"app": deployment_name},
        ports=[k8s.core.v1.ServicePortArgs(
            port=tf_serving_port,
            target_port=container_port,
        )]
    ))

# Export the Service's IP address
pulumi.export('tf_serving_ip', tf_service.status.apply(lambda status: status.load_balancer.ingress[0].ip if status.load_balancer.ingress else None))
```

In this Pulumi program:

- We use a Deployment to manage TensorFlow Serving pods. The deployment ensures that the desired number of pods, with the appropriate TensorFlow Serving Docker image and configurations, are running and available to serve the model.

- We then create a Kubernetes Service of type `LoadBalancer`, which will automatically create a cloud provider LoadBalancer to route traffic to the TensorFlow Serving pod. The service targets port `8501` on the pods, which is the default port that TensorFlow Serving uses to expose its gRPC and REST APIs.

After running this Pulumi program, you will create a deployment and service within your Kubernetes cluster to serve your TensorFlow model.

Please remember to replace `model_data_path` with the actual path where your TensorFlow model is stored. Additionally, you might need to customize the TensorFlow Serving image or its start arguments to fit your needs.

To run this program, you'll need to have the Pulumi CLI installed and have access to a Kubernetes cluster. Save the code into a file named `__main__.py` in a Pulumi project folder and execute it with the `pulumi up` command. This will prompt Pulumi to provision the resources defined in the program.