Persistent Storage for Model Serving with Kubernetes

Question

Pulumi · Accepted Answer

To set up persistent storage for model serving on a Kubernetes cluster, you'll generally need to define a few key resources:

1. **PersistentVolume (PV)**: Represents a piece of storage in your cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes.
2. **PersistentVolumeClaim (PVC)**: A request for storage by a user that can be consumed by a pod.
3. **StorageClass**: Provides a way for administrators to describe the "classes" of storage they offer, which can be used to dynamically provision storage based on these classes.

For model serving, you may want to use a `StatefulSet` in Kubernetes, which is good for applications that require stable, unique network identifiers, stable, persistent storage, and ordered, graceful deployment and scaling.

A common use case involves using a specialized container for serving models, such as Tensorflow Serving, NVIDIA Triton, or Seldon, and connecting it with storage where the models are stored.

Here's a program that sets up a persistent storage-backed StatefulSet suitable for serving models in Kubernetes. The code uses the Pulumi Kubernetes provider and assumes that you have a Kubernetes cluster already set up and configured with Pulumi.

```python
import pulumi
import pulumi_kubernetes as k8s

# Create a StorageClass for dynamic provisioning
storage_class = k8s.storage.v1.StorageClass("model-storage-class",
    metadata=k8s.meta.v1.ObjectMetaArgs(
        name="model-storage-class",
    ),
    provisioner="k8s.io/minikube-hostpath",  # This would be the provisioner like aws-ebs, azure-disk, etc.
    reclaim_policy="Retain",
    volume_binding_mode="Immediate"
)

# Create a PersistentVolumeClaim to request storage
pvc = k8s.core.v1.PersistentVolumeClaim("model-pvc",
    metadata=k8s.meta.v1.ObjectMetaArgs(
        name="model-pvc",
    ),
    spec=k8s.core.v1.PersistentVolumeClaimSpecArgs(
        access_modes=["ReadWriteOnce"],  # This should match the access modes supported by your provisioner
        resources=k8s.core.v1.ResourceRequirementsArgs(
            requests={"storage": "5Gi"},  # Request 5GiB of storage
        ),
        storage_class_name=storage_class.metadata.name,
    )
)

# Create a StatefulSet to serve models with TensorFlow Serving, as an example
stateful_set = k8s.apps.v1.StatefulSet("model-stateful-set",
    metadata=k8s.meta.v1.ObjectMetaArgs(
        name="model-server",
    ),
    spec=k8s.apps.v1.StatefulSetSpecArgs(
        selector=k8s.meta.v1.LabelSelectorArgs(
            match_labels={"app": "model-server"},
        ),
        serviceName="model-service",
        replicas=1,  # You can scale this up if necessary
        template=k8s.core.v1.PodTemplateSpecArgs(
            metadata=k8s.meta.v1.ObjectMetaArgs(
                labels={"app": "model-server"},
            ),
            spec=k8s.core.v1.PodSpecArgs(
                containers=[
                    k8s.core.v1.ContainerArgs(
                        name="model-container",
                        image="tensorflow/serving",  # Image can be replaced with any model serving container
                        ports=[k8s.core.v1.ContainerPortArgs(container_port=8501)],
                        # Mount the PersistentVolumeClaim
                        volume_mounts=[
                            k8s.core.v1.VolumeMountArgs(
                                name="model-storage",
                                mount_path="/models"  # Path where the model files will be mounted
                            ),
                        ],
                    ),
                ],
                # Define the volumes based on earlier PVC
                volumes=[
                    k8s.core.v1.VolumeArgs(
                        name="model-storage",
                        persistent_volume_claim=k8s.core.v1.PersistentVolumeClaimVolumeSourceArgs(
                            claim_name=pvc.metadata.name,
                        ),
                    ),
                ],
            ),
        ),
        # Optional: Attach PersistentVolumeClaims to the pod
        volume_claim_templates=[pvc],
    )
)

# Export the Service name to easily identify it
pulumi.export("model_service_name", stateful_set.metadata.name)
```

In this program, we create a `StorageClass` for dynamic provisioning, which means Kubernetes will automatically create a `PersistentVolume` that matches the `PersistentVolumeClaim` request. We create a `PersistentVolumeClaim` named `model-pvc` to request the actual storage resource, which the `StatefulSet` will use to store model data.

The `StatefulSet`, named `model-stateful-set`, has a single replica and uses the `tensorflow/serving` container image to serve models. The `PersistentVolumeClaim` is mounted into the container at the path `/models`, where the serving software expects to find the models.

This setup ensures that the model data remains persistent across restarts and rescheduling of the pods within the `StatefulSet`. If you're using this in production, make sure to replace `"k8s.io/minikube-hostpath"` with your actual storage provisioner, and `"tensorflow/serving"` with the container image you’re using to serve your models.

You can apply this Pulumi program with the `pulumi up` command, and it will create the necessary resources on your Kubernetes cluster. This is a solid foundation for model serving. However, depending on your specific use case, you might need to customize storage options, resource requests, or the deployment strategy of your `StatefulSet`.