K8s-based Model Deployment with NGINX for High Availability

Question

Pulumi · Accepted Answer

To deploy a Kubernetes-based model with NGINX for high availability, we will use Pulumi to define and deploy our infrastructure. We will create a deployment to manage a set of identical pods—containers running our model. We'll use NGINX as our ingress controller to manage access to our model service outside of our Kubernetes cluster.

We will achieve high availability using the following components:

1. **Deployment** - Ensures that a specified number of pod replicas are running at any one time. In case a pod fails, the Deployment will replace it.

2. **Service** - Provides a stable endpoint for accessing the pods in the Deployment using a single stable IP address or DNS name.

3. **Ingress Controller** - Manages external access to the services, typically through HTTP. We’ll use NGINX as our Ingress controller.

4. **Ingress Resource** - Uses rules to forward external HTTP(S) traffic to internal services.

5. **Horizontal Pod Autoscaler (HPA)** - Automatically scales the number of pods in a Deployment based on observed CPU utilization or other select metrics.

Here's a Pulumi program in Python that helps you set up such a configuration:

```python
import pulumi
import pulumi_kubernetes as k8s

# Configuration for the NGINX Ingress Controller.
nginx_ingress_controller = kubernetes_ingress_nginx.IngressController("nginx-ingress-controller",
    # Configuration parameters can be set here according to your requirements.
    controller=kubernetes_ingress_nginx.IngressControllerControllerArgs(
        service=kubernetes_ingress_nginx.IngressControllerControllerServiceArgs(
            type="LoadBalancer",  # Use a LoadBalancer to expose the NGINX Ingress Controller.
        ),
        # Additional configurations can be added here.
    ),
    # Enable Helm chart options if required.
    # helmOptions=kubernetes_ingress_nginx.IngressControllerHelmOptionsArgs(...)
)

# Assuming a machine learning model container image is available at "model-image-repo/model:tag".
# Replace with your actual model image path.
model_app_labels = {"app": "modelapp"}
model_deployment = k8s.apps.v1.Deployment("model-deployment",
    spec=k8s.apps.v1.DeploymentSpecArgs(
        replicas=3,  # Start with 3 replicas for high availability.
        selector=k8s.meta.v1.LabelSelectorArgs(match_labels=model_app_labels),
        template=k8s.core.v1.PodTemplateSpecArgs(
            metadata=k8s.meta.v1.ObjectMetaArgs(labels=model_app_labels),
            spec=k8s.core.v1.PodSpecArgs(
                containers=[k8s.core.v1.ContainerArgs(
                    name="model-container",
                    image="model-image-repo/model:tag",
                    ports=[k8s.core.v1.ContainerPortArgs(container_port=80)],  # Assuming the model serves on port 80.
                )],
            ),
        ),
    ))

# Create a Service to expose the Deployment.
model_service = k8s.core.v1.Service("model-service",
    metadata=k8s.meta.v1.ObjectMetaArgs(
        labels=model_app_labels,
    ),
    spec=k8s.core.v1.ServiceSpecArgs(
        type="ClusterIP",  # Use ClusterIP to make the service only reachable within the cluster.
        ports=[k8s.core.v1.ServicePortArgs(
            port=80,  # External port.
            target_port=80,  # Port on the container.
        )],
        selector=model_app_labels,
    ))

# Create an Ingress Resource to expose the Service externally.
model_ingress = k8s.networking.v1.Ingress("model-ingress",
    metadata=k8s.meta.v1.ObjectMetaArgs(
        annotations={
            "kubernetes.io/ingress.class": "nginx",
            # Add any other annotations required for ingress controller.
        },
    ),
    spec=k8s.networking.v1.IngressSpecArgs(
        rules=[k8s.networking.v1.IngressRuleArgs(
            http=k8s.networking.v1.HTTPIngressRuleValueArgs(
                paths=[k8s.networking.v1.HTTPIngressPathArgs(
                    path="/model",
                    path_type="Prefix",
                    backend=k8s.networking.v1.IngressBackendArgs(
                        service=k8s.networking.v1.IngressServiceBackendArgs(
                            name=model_service.metadata.name,
                            port=k8s.networking.v1.ServiceBackendPortArgs(number=80),
                        ),
                    ),
                )],
            ),
        )],
    ))

# Apply HPA to autoscale based on CPU or custom metrics.
model_hpa = k8s.autoscaling.v1.HorizontalPodAutoscaler("model-hpa",
    spec=k8s.autoscaling.v1.HorizontalPodAutoscalerSpecArgs(
        scale_target_ref=k8s.autoscaling.v1.CrossVersionObjectReferenceArgs(
            api_version="apps/v1",
            kind="Deployment",
            name=model_deployment.metadata.name,
        ),
        min_replicas=3,
        max_replicas=10,  # Maximum number of replicas for the HPA to scale to.
        target_cpu_utilization_percentage=50,  # Adjust based on CPU utilization.
    ))

# Export the Ingress Controller's IP to be known.
pulumi.export("ingress_ip", nginx_ingress_controller.status.apply(lambda status: status.load_balancer.ingress[0].ip))
```

This program does the following:

- Deploys an NGINX Ingress Controller to manage traffic into the cluster.
- Sets up a Deployment for your model with 3 replicas for redundancy.
- Exposes the Deployment within the cluster with a ClusterIP Service.
- Creates an Ingress resource to route external traffic to the service, using a LoadBalancer to expose the ingress controllers.
- Sets up an HPA to allow the number of model replicas to scale up or down based on CPU utilization, with a minimum of 3 and a max of 10 replicas.
- Exports the external IP address that will be allocated to the ingress controller, which you use to access your model from outside the cluster.

The NGINX Ingress Controller is a robust choice for managing ingress as it's widely used and supports a lot of features out-of-the-box. You need to ensure that you have the right permissions and roles set up in your Kubernetes cluster for the Ingress controller to function correctly.

The Horizontal Pod Autoscaler (HPA) automatically scales the number of pods up or down depending on the usage, ensuring that the deployment can handle varying loads while optimizing resource utilization.

Remember, you'll need access to a Kubernetes cluster, and you should have `kubectl` configured to communicate with your cluster. Pulumi should also be setup with the appropriate credentials to deploy resources to your cluster. You should replace `"model-image-repo/model:tag"` with the actual image path of your machine learning model.