AI Model Serving with Kubernetes Kong Ingress Controller

Question

Pulumi · Accepted Answer

To create an AI model serving infrastructure with Kubernetes using the Kong Ingress Controller, you would need to do the following:

1. **Set up a Kubernetes cluster**, if you don’t already have one.
2. **Deploy Kong Ingress Controller** to the cluster. Kong will manage external access to the services in a Kubernetes cluster.
3. **Create and deploy your AI model serving application** as a Deployment resource and expose it using a Service resource within Kubernetes.
4. **Define Ingress resources** that tell Kong how to route external HTTP traffic to your AI model serving application.

Here's a breakdown of how you would accomplish this with Pulumi:

- **Kubernetes cluster**: You can create a cluster using managed Kubernetes services like Amazon EKS, Google GKE, or Azure AKS, or you can define it using the Kubernetes resources directly if you're running on-premises or on VMs. This isn't covered in the Pulumi code below, assuming you already have a cluster configured and your `kubectl` is configured to communicate with it.

- **Kong Ingress Controller**: You would deploy this to your cluster. Kong provides an official Helm chart that makes it simple to install. With Pulumi, you can use the `pulumi_kubernetes.helm.v3.Chart` class to deploy this chart.

- **AI Model serving application**: This should be containerized; if it isn't already, you would need to create a Docker image and push it to a container registry. Then you would define a `Deployment` and `Service` in Kubernetes to run and expose your application. This example assumes you have already containerized your AI Model serving app.

- **Ingress resource**: You will define an Ingress resource using Pulumi's Kubernetes SDK, which will use Kong as the ingress controller to route traffic to your AI Model serving application.

Here's a basic Pulumi program in Python that sets up the Kong Ingress Controller and a placeholder AI model serving application:

```python
import pulumi
import pulumi_kubernetes as k8s
from pulumi_kubernetes.helm.v3 import Chart, ChartOpts

# Initialize Kubernetes provider using the current context from kubeconfig
k8s_provider = k8s.Provider('k8s-provider', kubeconfig=pulumi.Config('kubernetes').require('kubeconfig'))

# Deploy Kong Ingress Controller using the Helm Chart
# Here, we assume that the Helm chart is available in the default Helm chart repository.
# Please refer to the Kong documentation for the appropriate chart version and repository.
kong_ingress_controller = Chart(
    'kong',
    ChartOpts(
        chart='kong',
        version='2.5.0',  # Specify the version of Kong Ingress Controller you wish to deploy
        namespace='kube-system',  # Deploy into the kube-system namespace or choose an appropriate namespace
        fetch_opts=k8s.helm.v3.FetchOpts(
            repo='https://charts.konghq.com',  # Kong Helm chart repository
        ),
    ),
    opts=pulumi.ResourceOptions(provider=k8s_provider)
)

# Placeholder for AI Model serving application deployment and service
# Replace 'your-docker-image' with the image of your AI model serving application
ai_model_deployment = k8s.apps.v1.Deployment(
    'ai-model-deployment',
    spec={
        'selector': {'matchLabels': {'app': 'ai-model'}},
        'replicas': 1,
        'template': {
            'metadata': {'labels': {'app': 'ai-model'}},
            'spec': {'containers': [{'name': 'ai-model', 'image': 'your-docker-image'}]}
        }
    },
    opts=pulumi.ResourceOptions(provider=k8s_provider)
)

ai_model_service = k8s.core.v1.Service(
    'ai-model-service',
    spec={
        'selector': {'app': 'ai-model'},
        'ports': [{'port': 80, 'targetPort': 8080}],  # Adjust port and targetPort according to your application's configuration
        'type': 'ClusterIP',  # Use 'LoadBalancer' if you want to expose your service externally or 'ClusterIP' for internal access
    },
    opts=pulumi.ResourceOptions(provider=k8s_provider)
)

# Ingress resource that uses Kong to manage incoming traffic
ai_model_ingress = k8s.networking.v1.Ingress(
    'ai-model-ingress',
    metadata={
        'annotations': {
            'kubernetes.io/ingress.class': 'kong',  # This annotation specifies that Kong should handle this Ingress
        }
    },
    spec={
        'rules': [
            {
                'http': {
                    'paths': [
                        {
                            'path': '/ai-model',  # Path for accessing the AI model serving application
                            'pathType': 'Prefix',  # The type of path matching to use
                            'backend': {
                                'service': {
                                    'name': ai_model_service.metadata['name'],
                                    'port': {'number': 80},  # Port must match the one specified in the service
                                }
                            },
                        },
                    ],
                },
            },
        ],
    },
    opts=pulumi.ResourceOptions(provider=k8s_provider)
)

# Export the AI model serving application URL
pulumi.export('ai_model_url', ai_model_ingress.status['loadBalancer']['ingress'][0]['hostname'])
```

Remember to replace `'your-docker-image'` with the Docker image URL of your AI model serving application and adjust the ports accordingly to match your application's listening port.

### Explanation
- We import Pulumi's Kubernetes SDK to interact with Kubernetes resources.
- We define a Kubernetes provider instance using the current context from the kubeconfig file.
- The Kong Ingress Controller is deployed using its Helm chart, specifying the chart version and attaching it to the Kubernetes provider.
- Next, we define a placeholder Kubernetes deployment and service for the AI model service using the Docker image URL, ports, and other specifications.
- We create an Ingress resource that tells the Kong Ingress Controller how to route external HTTP traffic to the AI model serving application, based on the specified path and annotations.
- Finally, we export the AI model serving application URL, which can be obtained after deployment from the Kong Ingress' external IP or hostname.

By running this Pulumi program, you should have a fully functional AI model serving application, accessible through Kong as the Ingress Controller.