Integrating Kong for ML Workload Canary Deployments

Question

Pulumi · Accepted Answer

Canary deployment is an approach to rolling out new versions of a service where you start with a small subset of users or traffic before proceeding to a full rollout. It’s often used to reduce the risk of introducing a new version by slowly shifting traffic to the new version and monitoring its performance.

In your case, integrating Kong for managing ML Workload Canary Deployments enables you to control the traffic flow to different versions of your service. By leveraging Kong's capabilities, such as routing, service discovery, and load balancing, you can incrementally divert traffic to the new version of your ML service under specific conditions.

Here is a Pulumi Python program that uses Kong to create the following resources for a canary deployment scenario:

1. **Service**: This represents your machine learning service where traffic needs to be diverted.
2. **Upstream**: This defines a pool of backend services to which Kong will proxy traffic.
3. **Target**: These represent the instances of your machine learning service. In a canary deployment, you'll have both the stable and the canary instances as targets in your upstream.
4. **Route**: This specifies rules to determine how requests get sent to different services.

In this example, we will create an upstream for a backend ML service and add two targets to that upstream: one for the stable release and one for the canary release of your ML workload. We will then route traffic to the upstream such that a portion of the traffic is diverted to the canary deployment.

Let's define the resources we need:

```python
import pulumi
import pulumi_kong as kong

# Define the ML service (This is your stable ML workload)
ml_service = kong.Service(
    "ml-service",
    name="ml-service",
    protocol="http",
    host="stable-version.example.com",
    port=80
)

# Define an Upstream for the ML service
# This is the logical hostname to be used by Kong before it forwards requests to upstream (backend) services
ml_upstream = kong.Upstream(
    "ml-upstream",
    name="ml-upstream",
)

# Define two Targets within the Upstream: one for the stable and one for the canary deployment
stable_target = kong.Target(
    "stable-target",
    target="stable-version.example.com:80",
    weight=4,  # Assuming 80% of traffic to stable (weight 4)
    upstream_id=ml_upstream.id
)
canary_target = kong.Target(
    "canary-target",
    target="canary-version.example.com:80",
    weight=1,  # Assuming 20% of traffic to canary (weight 1)
    upstream_id=ml_upstream.id
)

# Define a Route that forwards requests to the Upstream
ml_route = kong.Route(
    "ml-route",
    protocols=["http"],
    methods=["GET"],
    service_id=ml_service.id,
    paths=["/predict"],  # Example endpoint for ML predictions
)

# Output the public DNS name of the Kong API Gateway to access the service
pulumi.export("api_gateway_url", pulumi.Output.concat(
    "http://",
    ml_service.host.apply(lambda host: f"{host}/predict")))
```

Here's what each piece of the infrastructure does:

- The `ml_service` defines your machine learning service endpoint. This service is where the stable version of your ML workload runs.
- The `ml_upstream` is where requests will be forwarded by Kong. It represents a single logical endpoint to which we can attach multiple targets, like stable and canary.
- `stable_target` and `canary_target` represent the actual physical services (endpoints) that serve the stable and canary versions of your ML workload. By assigning different weights, we are managing the distribution of traffic between stable and canary.
- The `ml_route` is what controls the way requests are sent to our service. It specifies how to reach the ML prediction endpoint (`/predict`). All HTTP `GET` requests to this path will be handled by the upstream.

In a real-world scenario, you would implement monitoring and automated rollback to ensure that any problematic changes in the canary deployment are quickly reverted. You might also use more sophisticated traffic routing based on user characteristics or other request attributes, but that requires additional configuration within Kong.

Remember, before running this Pulumi program, you'll need to set up the Pulumi CLI, authenticate with your cloud provider, and install the Kong provider plugin. After deploying the resources, you can access the Kong-managed endpoint at the outputted URL and observe the traffic to your ML service versions.