Persistent Storage for Model Serving with Kubernetes
PythonTo set up persistent storage for model serving on a Kubernetes cluster, you'll generally need to define a few key resources:
- PersistentVolume (PV): Represents a piece of storage in your cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes.
- PersistentVolumeClaim (PVC): A request for storage by a user that can be consumed by a pod.
- StorageClass: Provides a way for administrators to describe the "classes" of storage they offer, which can be used to dynamically provision storage based on these classes.
For model serving, you may want to use a
StatefulSet
in Kubernetes, which is good for applications that require stable, unique network identifiers, stable, persistent storage, and ordered, graceful deployment and scaling.A common use case involves using a specialized container for serving models, such as Tensorflow Serving, NVIDIA Triton, or Seldon, and connecting it with storage where the models are stored.
Here's a program that sets up a persistent storage-backed StatefulSet suitable for serving models in Kubernetes. The code uses the Pulumi Kubernetes provider and assumes that you have a Kubernetes cluster already set up and configured with Pulumi.
import pulumi import pulumi_kubernetes as k8s # Create a StorageClass for dynamic provisioning storage_class = k8s.storage.v1.StorageClass("model-storage-class", metadata=k8s.meta.v1.ObjectMetaArgs( name="model-storage-class", ), provisioner="k8s.io/minikube-hostpath", # This would be the provisioner like aws-ebs, azure-disk, etc. reclaim_policy="Retain", volume_binding_mode="Immediate" ) # Create a PersistentVolumeClaim to request storage pvc = k8s.core.v1.PersistentVolumeClaim("model-pvc", metadata=k8s.meta.v1.ObjectMetaArgs( name="model-pvc", ), spec=k8s.core.v1.PersistentVolumeClaimSpecArgs( access_modes=["ReadWriteOnce"], # This should match the access modes supported by your provisioner resources=k8s.core.v1.ResourceRequirementsArgs( requests={"storage": "5Gi"}, # Request 5GiB of storage ), storage_class_name=storage_class.metadata.name, ) ) # Create a StatefulSet to serve models with TensorFlow Serving, as an example stateful_set = k8s.apps.v1.StatefulSet("model-stateful-set", metadata=k8s.meta.v1.ObjectMetaArgs( name="model-server", ), spec=k8s.apps.v1.StatefulSetSpecArgs( selector=k8s.meta.v1.LabelSelectorArgs( match_labels={"app": "model-server"}, ), serviceName="model-service", replicas=1, # You can scale this up if necessary template=k8s.core.v1.PodTemplateSpecArgs( metadata=k8s.meta.v1.ObjectMetaArgs( labels={"app": "model-server"}, ), spec=k8s.core.v1.PodSpecArgs( containers=[ k8s.core.v1.ContainerArgs( name="model-container", image="tensorflow/serving", # Image can be replaced with any model serving container ports=[k8s.core.v1.ContainerPortArgs(container_port=8501)], # Mount the PersistentVolumeClaim volume_mounts=[ k8s.core.v1.VolumeMountArgs( name="model-storage", mount_path="/models" # Path where the model files will be mounted ), ], ), ], # Define the volumes based on earlier PVC volumes=[ k8s.core.v1.VolumeArgs( name="model-storage", persistent_volume_claim=k8s.core.v1.PersistentVolumeClaimVolumeSourceArgs( claim_name=pvc.metadata.name, ), ), ], ), ), # Optional: Attach PersistentVolumeClaims to the pod volume_claim_templates=[pvc], ) ) # Export the Service name to easily identify it pulumi.export("model_service_name", stateful_set.metadata.name)
In this program, we create a
StorageClass
for dynamic provisioning, which means Kubernetes will automatically create aPersistentVolume
that matches thePersistentVolumeClaim
request. We create aPersistentVolumeClaim
namedmodel-pvc
to request the actual storage resource, which theStatefulSet
will use to store model data.The
StatefulSet
, namedmodel-stateful-set
, has a single replica and uses thetensorflow/serving
container image to serve models. ThePersistentVolumeClaim
is mounted into the container at the path/models
, where the serving software expects to find the models.This setup ensures that the model data remains persistent across restarts and rescheduling of the pods within the
StatefulSet
. If you're using this in production, make sure to replace"k8s.io/minikube-hostpath"
with your actual storage provisioner, and"tensorflow/serving"
with the container image you’re using to serve your models.You can apply this Pulumi program with the
pulumi up
command, and it will create the necessary resources on your Kubernetes cluster. This is a solid foundation for model serving. However, depending on your specific use case, you might need to customize storage options, resource requests, or the deployment strategy of yourStatefulSet
.