Scalable Model Serving with Custom CRDs

Question

Pulumi · Accepted Answer

In Kubernetes, Custom Resource Definitions (CRDs) allow you to define custom resources that behave like built-in resources. They are commonly used to extend Kubernetes with new kinds of API objects that are specific to a project or to an organization. For scalable model serving, you might create a CRD to represent a machine learning model and then deploy instances of that custom resource to serve predictions.

In the context of Pulumi and Kubernetes, you would use the `CustomResource` Python class provided by Pulumi's Kubernetes provider to create and manage these custom resources from your Pulumi program. Additionally, `CustomResourceDefinition` (CRD) would be used to define the shape and schema of your custom resources.

Let's walk through a simple program to deploy a CRD for a model serving resource, and subsequently create an instance of this custom resource:

1. Define a CRD with the necessary schema that Kubernetes will recognize.
2. Create a custom resource from this CRD which will represent our model serving.
3. Deploy and potentially scale this model serving by manipulating this custom resource using Pulumi.

Before we dive into the code, ensure you have the following prerequisites met:
- Pulumi CLI installed and configured with the appropriate Kubernetes context.
- Access to a Kubernetes cluster where you have permission to create CRDs and custom resources.

Here's a Pulumi program in Python that demonstrates these steps:

```python
import pulumi
import pulumi_kubernetes as k8s

# Define the CustomResourceDefinition for model serving
model_serving_crd = k8s.apiextensions.v1.CustomResourceDefinition(
    "model-serving-crd",
    metadata=k8s.meta.v1.ObjectMetaArgs(name="modelservings.ai.example.com"),
    spec=k8s.apiextensions.v1.CustomResourceDefinitionSpecArgs(
        group="ai.example.com",
        versions=[k8s.apiextensions.v1.CustomResourceDefinitionVersionArgs(
            name="v1",
            served=True,
            storage=True,
            schema=k8s.apiextensions.v1.CustomResourceValidationArgs(
                openAPIV3Schema=k8s.apiextensions.v1.JSONSchemaPropsArgs(
                    type="object",
                    properties={
                        "spec": k8s.apiextensions.v1.JSONSchemaPropsArgs(
                            type="object",
                            properties={
                                "image": k8s.apiextensions.v1.JSONSchemaPropsArgs(type="string"),
                                "replicas": k8s.apiextensions.v1.JSONSchemaPropsArgs(type="integer"),
                            },
                        ),
                    },
                ),
            ),
        )],
        scope="Namespaced",
        names=k8s.apiextensions.v1.CustomResourceDefinitionNamesArgs(
            plural="modelservings",
            singular="modelserving",
            kind="ModelServing",
            short_names=["ms"]
        ),
    )
)

# Create an instance of the custom ModelServing resource
model_serving_instance = k8s.apiextensions.CustomResource(
    "model-serving-instance",
    api_version="ai.example.com/v1",
    kind="ModelServing",
    metadata=k8s.meta.v1.ObjectMetaArgs(name="example-model-serving"),
    spec={
        "image": "example-model-image:v1",
        "replicas": 3
    },
    opts=pulumi.ResourceOptions(depends_on=[model_serving_crd])
)

# Export the name of the model serving instance
pulumi.export("model_serving_name", model_serving_instance.metadata["name"])
```

**Explanation:**

- We begin by importing the necessary Pulumi modules. The `pulumi_kubernetes` as `k8s` module contains all the types needed to interact with Kubernetes.
- The `model_serving_crd` resource defines the schema for a new resource type named `ModelServing`. It includes the group `ai.example.com`, the version `v1`, and a short name `ms`.
- The `spec` within the CRD outlines the structure of the `ModelServing`, including required fields such as `image` and `replicas`. This is akin to defining the columns of a database table.
- After defining the CRD, we then create an instance of this CRD with `model_serving_instance`. This represents a specific model serving object that we want to deploy in our cluster.
- The `depends_on` option ensures that the custom resource is not created until the CRD is successfully applied to the Kubernetes cluster.
- Finally, we export the name of our model serving instance as an output for easy access.

This Pulumi program, when executed, will apply the CRD to your Kubernetes cluster and create an instance of the `ModelServing` custom resource based on that definition. The instance uses the specified `image` and `replicas`, exemplifying how you can manage a scalable model serving workload using Kubernetes and Pulumi.