Scalable Model Serving with Custom CRDs
PythonIn Kubernetes, Custom Resource Definitions (CRDs) allow you to define custom resources that behave like built-in resources. They are commonly used to extend Kubernetes with new kinds of API objects that are specific to a project or to an organization. For scalable model serving, you might create a CRD to represent a machine learning model and then deploy instances of that custom resource to serve predictions.
In the context of Pulumi and Kubernetes, you would use the
CustomResource
Python class provided by Pulumi's Kubernetes provider to create and manage these custom resources from your Pulumi program. Additionally,CustomResourceDefinition
(CRD) would be used to define the shape and schema of your custom resources.Let's walk through a simple program to deploy a CRD for a model serving resource, and subsequently create an instance of this custom resource:
- Define a CRD with the necessary schema that Kubernetes will recognize.
- Create a custom resource from this CRD which will represent our model serving.
- Deploy and potentially scale this model serving by manipulating this custom resource using Pulumi.
Before we dive into the code, ensure you have the following prerequisites met:
- Pulumi CLI installed and configured with the appropriate Kubernetes context.
- Access to a Kubernetes cluster where you have permission to create CRDs and custom resources.
Here's a Pulumi program in Python that demonstrates these steps:
import pulumi import pulumi_kubernetes as k8s # Define the CustomResourceDefinition for model serving model_serving_crd = k8s.apiextensions.v1.CustomResourceDefinition( "model-serving-crd", metadata=k8s.meta.v1.ObjectMetaArgs(name="modelservings.ai.example.com"), spec=k8s.apiextensions.v1.CustomResourceDefinitionSpecArgs( group="ai.example.com", versions=[k8s.apiextensions.v1.CustomResourceDefinitionVersionArgs( name="v1", served=True, storage=True, schema=k8s.apiextensions.v1.CustomResourceValidationArgs( openAPIV3Schema=k8s.apiextensions.v1.JSONSchemaPropsArgs( type="object", properties={ "spec": k8s.apiextensions.v1.JSONSchemaPropsArgs( type="object", properties={ "image": k8s.apiextensions.v1.JSONSchemaPropsArgs(type="string"), "replicas": k8s.apiextensions.v1.JSONSchemaPropsArgs(type="integer"), }, ), }, ), ), )], scope="Namespaced", names=k8s.apiextensions.v1.CustomResourceDefinitionNamesArgs( plural="modelservings", singular="modelserving", kind="ModelServing", short_names=["ms"] ), ) ) # Create an instance of the custom ModelServing resource model_serving_instance = k8s.apiextensions.CustomResource( "model-serving-instance", api_version="ai.example.com/v1", kind="ModelServing", metadata=k8s.meta.v1.ObjectMetaArgs(name="example-model-serving"), spec={ "image": "example-model-image:v1", "replicas": 3 }, opts=pulumi.ResourceOptions(depends_on=[model_serving_crd]) ) # Export the name of the model serving instance pulumi.export("model_serving_name", model_serving_instance.metadata["name"])
Explanation:
- We begin by importing the necessary Pulumi modules. The
pulumi_kubernetes
ask8s
module contains all the types needed to interact with Kubernetes. - The
model_serving_crd
resource defines the schema for a new resource type namedModelServing
. It includes the groupai.example.com
, the versionv1
, and a short namems
. - The
spec
within the CRD outlines the structure of theModelServing
, including required fields such asimage
andreplicas
. This is akin to defining the columns of a database table. - After defining the CRD, we then create an instance of this CRD with
model_serving_instance
. This represents a specific model serving object that we want to deploy in our cluster. - The
depends_on
option ensures that the custom resource is not created until the CRD is successfully applied to the Kubernetes cluster. - Finally, we export the name of our model serving instance as an output for easy access.
This Pulumi program, when executed, will apply the CRD to your Kubernetes cluster and create an instance of the
ModelServing
custom resource based on that definition. The instance uses the specifiedimage
andreplicas
, exemplifying how you can manage a scalable model serving workload using Kubernetes and Pulumi.