Envoy Ingress for Scalable ML Model Serving

Question

Pulumi · Accepted Answer

To set up an Envoy ingress for scalable ML model serving, we'll typically deploy a Kubernetes cluster where the Envoy proxy will act as the ingress controller, directing traffic to the appropriate services. For ML model serving, you might want to deploy a service that serves your machine learning model. To scale this, you can use Kubernetes' horizontal pod autoscaling which automatically scales the number of pods based on CPU usage or other select metrics.

In the code below, we will use Pulumi with the `pulumi_kubernetes` provider to deploy a Kubernetes cluster. Then, we'll deploy an Envoy ingress within that cluster and a deployment that represents a mock ML model serving application. I'll include comments to guide you through each step of the process.

Let's start building the Pulumi program:

```python
import pulumi
import pulumi_kubernetes as k8s

# Set up a Kubernetes cluster. This example will use a simple, pre-configured cluster.
# In a production environment, you might use AKS, EKS, GKE, or a Pulumi component resource.
cluster = k8s.core.v1.Namespace("ml-serving-cluster")

# Create a Kubernetes Deployment for the ML model serving application.
# This example uses a placeholder image and should be replaced with the actual image that serves your ML model.
app_labels = {"app": "ml-serving"}
ml_serving_deployment = k8s.apps.v1.Deployment(
    "ml-serving-deployment",
    metadata=k8s.meta.v1.ObjectMetaArgs(namespace=cluster.metadata["name"]),
    spec=k8s.apps.v1.DeploymentSpecArgs(
        selector=k8s.meta.v1.LabelSelectorArgs(match_labels=app_labels),
        replicas=2,  # Start with 2 replicas, can be scaled with HorizontalPodAutoscaler.
        template=k8s.core.v1.PodTemplateSpecArgs(
            metadata=k8s.meta.v1.ObjectMetaArgs(labels=app_labels),
            spec=k8s.core.v1.PodSpecArgs(
                containers=[k8s.core.v1.ContainerArgs(
                    name="ml-serving-container",
                    image="YOUR_ML_MODEL_SERVING_IMAGE",  # Replace with your ML model serving image.
                    ports=[k8s.core.v1.ContainerPortArgs(container_port=80)],
                    resources=k8s.core.v1.ResourceRequirementsArgs(
                        requests={"cpu": "100m", "memory": "200Mi"},
                        limits={"cpu": "500m", "memory": "500Mi"},
                    ),
                )],
            ),
        ),
    ),
)

# Create a Service to expose the ML model serving Deployment.
ml_serving_service = k8s.core.v1.Service(
    "ml-serving-service",
    metadata=k8s.meta.v1.ObjectMetaArgs(namespace=cluster.metadata['name'], labels=app_labels),
    spec=k8s.core.v1.ServiceSpecArgs(
        ports=[k8s.core.v1.ServicePortArgs(port=80)],
        selector=app_labels,
        type="ClusterIP",  # Use "LoadBalancer" if you want an external IP.
    ),
)

# Set up Envoy as the ingress controller.
# The Envoy configuration will need to be customized based on the specifics of your application.
envoy_deployment = k8s.apps.v1.Deployment(
    "envoy-deployment",
    metadata=k8s.meta.v1.ObjectMetaArgs(namespace=cluster.metadata["name"]),
    spec=k8s.apps.v1.DeploymentSpecArgs(
        selector=k8s.meta.v1.LabelSelectorArgs(match_labels={"app": "envoy-ingress"}),
        template=k8s.core.v1.PodTemplateSpecArgs(
            metadata=k8s.meta.v1.ObjectMetaArgs(labels={"app": "envoy-ingress"}),
            spec=k8s.core.v1.PodSpecArgs(
                containers=[k8s.core.v1.ContainerArgs(
                    name="envoy-container",
                    image="envoyproxy/envoy:v1.18.3",  # Use the appropriate Envoy image.
                    ports=[k8s.core.v1.ContainerPortArgs(container_port=80)],
                    # Custom Envoy config will be mounted here.
                )],
            ),
        ),
    ),
)

# Expose the Envoy deployment as a service.
envoy_service = k8s.core.v1.Service(
    "envoy-service",
    metadata=k8s.meta.v1.ObjectMetaArgs(namespace=cluster.metadata['name'], labels={"app": "envoy-ingress"}),
    spec=k8s.core.v1.ServiceSpecArgs(
        type="LoadBalancer",
        ports=[k8s.core.v1.ServicePortArgs(port=80)],
        selector={"app": "envoy-ingress"},
    ),
)

# Export the service endpoint of the Envoy ingress.
pulumi.export('envoy_ingress_endpoint', envoy_service.status['load_balancer']['ingress'][0]['ip'])
```

In this Pulumi program:

- We create a Kubernetes namespace to serve as the environment for our deployment.
- We then set up a Kubernetes Deployment for our mock ML model serving application. Please replace `YOUR_ML_MODEL_SERVING_IMAGE` with the Docker image that contains your model.
- We then expose our ML model serving Deployment with a Kubernetes Service, allowing it to be accessed within the cluster.
- Next, we create a second Deployment for the Envoy proxy, which will serve as our ingress controller. The Envoy image used here (`envoyproxy/envoy:v1.18.3`) should be replaced with the version that matches your requirements.
- We then expose the Envoy proxy using another Kubernetes Service, this time with `LoadBalancer` type to receive external traffic. The service's endpoint is exported to easily access the ingress IP address.
- The actual configurations for Envoy, which are highly dependent on the specifics of your application, should be provided where the comment in the envoy-container spec mentions mounting custom configuration.
  
Please keep in mind this program is a simplified representation for educational purposes. In a real-world scenario, you would need to tailor the configuration and resource specifications to match the requirements of your ML model serving application and Envoy setup, including proper PersistentVolumes, ConfigMaps, Secrets, and more advanced features of Kubernetes or cloud-specific managed services like AKS, EKS, or GKE.