Kubernetes Ingress for Distributed Model Serving

Question

Pulumi · Accepted Answer

In Kubernetes, an Ingress is an API object that manages external access to the services in a cluster, typically HTTP. For your use case of distributed model serving, the Ingress will route external traffic to the appropriate model-serving service based on the request path or host.

To configure an Ingress for distributed model serving, you will need:

1. A Kubernetes cluster with the Ingress controller installed (you could use NGINX Ingress controller, for example).
2. One or more model-serving services deployed in the cluster.
3. An Ingress resource defining the access rules.

Below, you will find a Pulumi program written in Python that demonstrates how to set up an Ingress resource to distribute traffic to two different model-serving services based on the request path.

First, I will include the necessary import statements:

* `pulumi_kubernetes` is used to interact with Kubernetes resources.
* The `Ingress` class (from `networking.v1`) to create an Ingress resource.
* The `Service` class to define the backend services for handling the requests.

Here is the Pulumi program that demonstrates this:

```python
import pulumi
from pulumi_kubernetes import Provider
from pulumi_kubernetes.networking.v1 import Ingress
from pulumi_kubernetes.core.v1 import Service

# Assume that you have a Kubernetes Provider configured for Pulumi.
k8s_provider = Provider(resource_name='k8s')

# Define a service for Model A
model_a_service = Service('model-a-service',
    metadata={
        "name": "model-a-service"
    },
    spec={
        "selector": {
            "app": "model-a"
        },
        "ports": [{
            "port": 80,
            "targetPort": 8080
        }]
    },
    opts=pulumi.ResourceOptions(provider=k8s_provider)
)

# Define a service for Model B
model_b_service = Service('model-b-service',
    metadata={
        "name": "model-b-service"
    },
    spec={
        "selector": {
            "app": "model-b"
        },
        "ports": [{
            "port": 80,
            "targetPort": 8080
        }]
    },
    opts=pulumi.ResourceOptions(provider=k8s_provider)
)

# Create the Ingress resource
ingress = Ingress('model-serving-ingress',
    metadata={
        "annotations": {
            # Example of an NGINX-specific annotation for rewrite
            "nginx.ingress.kubernetes.io/rewrite-target": "/$2"
        }
    },
    spec={
        "rules": [{
            "http": {
                "paths": [
                    {
                        "path": "/model-a(/|$)(.*)",
                        "pathType": "Prefix",
                        "backend": {
                            "service": {
                                "name": model_a_service.metadata["name"],
                                "port": {
                                    "number": 80
                                }
                            }
                        }
                    },
                    {
                        "path": "/model-b(/|$)(.*)",
                        "pathType": "Prefix",
                        "backend": {
                            "service": {
                                "name": model_b_service.metadata["name"],
                                "port": {
                                    "number": 80
                                }
                            }
                        }
                    },
                ]
            }
        }]
    },
    opts=pulumi.ResourceOptions(provider=k8s_provider)
)

# Export the Ingress name
pulumi.export('ingress_name', ingress.metadata['name'])
```

In this program:
- Two `Service` resources are defined, one for each model-serving service (`model-a` and `model-b`).
- An `Ingress` resource is created that defines rules for routing traffic to these services.
  - Requests to `/model-a` will be routed to `model-a-service`.
  - Requests to `/model-b` will be routed to `model-b-service`.
- For simplicity, we assume each service is running on port 8080 and the Ingress will route traffic to them on port 80.
- Note that the `path` fields are using a regular expression to capture any sub-paths.
- The `nginx.ingress.kubernetes.io/rewrite-target` annotation is specific to the NGINX Ingress controller and indicates how to rewrite the URL path before forwarding the request to the backend service.

Ensure that you have applied the NGINX Ingress controller or an equivalent in your cluster and that the services represented as `model-a` and `model-b` are deployed and properly labeled. Adjust the `selector` fields to match the labels of your model-serving pods.

Finally, I have exported the name of the Ingress resource so that you can easily find it within your cluster after deployment. This is a standard practice for tracking generated resources in Pulumi.

Deploy the program using Pulumi CLI commands:
- Run `pulumi up` to preview and deploy the changes.
- Confirm the deployment by selecting 'yes'.
- After successful deployment, you can check the Ingress resource using `kubectl` or access the model services through the Ingress' external endpoint.