Kubernetes Ingress for Distributed Model Serving
PythonIn Kubernetes, an Ingress is an API object that manages external access to the services in a cluster, typically HTTP. For your use case of distributed model serving, the Ingress will route external traffic to the appropriate model-serving service based on the request path or host.
To configure an Ingress for distributed model serving, you will need:
- A Kubernetes cluster with the Ingress controller installed (you could use NGINX Ingress controller, for example).
- One or more model-serving services deployed in the cluster.
- An Ingress resource defining the access rules.
Below, you will find a Pulumi program written in Python that demonstrates how to set up an Ingress resource to distribute traffic to two different model-serving services based on the request path.
First, I will include the necessary import statements:
pulumi_kubernetes
is used to interact with Kubernetes resources.- The
Ingress
class (fromnetworking.v1
) to create an Ingress resource. - The
Service
class to define the backend services for handling the requests.
Here is the Pulumi program that demonstrates this:
import pulumi from pulumi_kubernetes import Provider from pulumi_kubernetes.networking.v1 import Ingress from pulumi_kubernetes.core.v1 import Service # Assume that you have a Kubernetes Provider configured for Pulumi. k8s_provider = Provider(resource_name='k8s') # Define a service for Model A model_a_service = Service('model-a-service', metadata={ "name": "model-a-service" }, spec={ "selector": { "app": "model-a" }, "ports": [{ "port": 80, "targetPort": 8080 }] }, opts=pulumi.ResourceOptions(provider=k8s_provider) ) # Define a service for Model B model_b_service = Service('model-b-service', metadata={ "name": "model-b-service" }, spec={ "selector": { "app": "model-b" }, "ports": [{ "port": 80, "targetPort": 8080 }] }, opts=pulumi.ResourceOptions(provider=k8s_provider) ) # Create the Ingress resource ingress = Ingress('model-serving-ingress', metadata={ "annotations": { # Example of an NGINX-specific annotation for rewrite "nginx.ingress.kubernetes.io/rewrite-target": "/$2" } }, spec={ "rules": [{ "http": { "paths": [ { "path": "/model-a(/|$)(.*)", "pathType": "Prefix", "backend": { "service": { "name": model_a_service.metadata["name"], "port": { "number": 80 } } } }, { "path": "/model-b(/|$)(.*)", "pathType": "Prefix", "backend": { "service": { "name": model_b_service.metadata["name"], "port": { "number": 80 } } } }, ] } }] }, opts=pulumi.ResourceOptions(provider=k8s_provider) ) # Export the Ingress name pulumi.export('ingress_name', ingress.metadata['name'])
In this program:
- Two
Service
resources are defined, one for each model-serving service (model-a
andmodel-b
). - An
Ingress
resource is created that defines rules for routing traffic to these services.- Requests to
/model-a
will be routed tomodel-a-service
. - Requests to
/model-b
will be routed tomodel-b-service
.
- Requests to
- For simplicity, we assume each service is running on port 8080 and the Ingress will route traffic to them on port 80.
- Note that the
path
fields are using a regular expression to capture any sub-paths. - The
nginx.ingress.kubernetes.io/rewrite-target
annotation is specific to the NGINX Ingress controller and indicates how to rewrite the URL path before forwarding the request to the backend service.
Ensure that you have applied the NGINX Ingress controller or an equivalent in your cluster and that the services represented as
model-a
andmodel-b
are deployed and properly labeled. Adjust theselector
fields to match the labels of your model-serving pods.Finally, I have exported the name of the Ingress resource so that you can easily find it within your cluster after deployment. This is a standard practice for tracking generated resources in Pulumi.
Deploy the program using Pulumi CLI commands:
- Run
pulumi up
to preview and deploy the changes. - Confirm the deployment by selecting 'yes'.
- After successful deployment, you can check the Ingress resource using
kubectl
or access the model services through the Ingress' external endpoint.