ELBv2-powered Canary Deployments for AI Services

Question

Pulumi · Accepted Answer

Canary deployments refer to the pattern of rolling out releases to a subset of users or servers. The basic principle is to deploy a new version to a small subset of users, monitor its behavior and performance, and then either roll it out to the entirety or roll it back. This mechanism helps in minimizing any disruptions to service for users and tests the new version's stability in a live environment.

To implement a canary deployment strategy for AI services on AWS using an Elastic Load Balancer (ELBv2), you would typically manage traffic to your services with an Application Load Balancer or a Network Load Balancer, which distributes incoming application traffic across multiple targets such as EC2 instances in multiple Availability Zones. For AI services, this can include instances running your machine learning model inference servers.

Pulumi provides infrastructure as code (IaC) capabilities to define and deploy cloud infrastructure for such a setup. To create a canary deployment, we can define a Pulumi program that provisions:

- Two sets of compute resources, one for the stable version of your service and one for the canary release. These could be EC2 instances or ECS tasks depending on your setup.
- An AWS Application Load Balancer (ALB) to manage the incoming traffic and direct a controlled percentage of it to the canary deployment.
- Listener rules on the ALB to define what percentage of traffic should go to the canary release.

Here's a simplified Pulumi program written in Python that demonstrates how you might set up such an infrastructure:

```python
import pulumi
import pulumi_aws as aws

# Provision a new EC2 instance which will serve as the stable version.
stable_version = aws.ec2.Instance("stableVersion",
    instance_type="t2.micro",
    ami="ami-0c55b159cbfafe1f0", # Replace with the correct AMI for your region/service.
    tags={
        "Name": "stable",
    },
)

# Provision a new EC2 instance which will serve as the canary version.
canary_version = aws.ec2.Instance("canaryVersion",
    instance_type="t2.micro",
    ami="ami-0c55b159cbfafe1f0", # Replace with the correct AMI. This can also represent your new AI service version.
    tags={
        "Name": "canary",
    },
)

# Create a new ALB for the canary deployment to direct traffic between stable and canary instances.
alb = aws.lb.LoadBalancer("appLB",
    internal=False,
    load_balancer_type="application",
    security_groups=[sg.id], # Assumes you have an existing security group 'sg'
    subnets=subnet_ids, # Provide your VPC subnet IDs
)

# Create target groups for stable and canary deployments.
stable_tg = aws.lb.TargetGroup("stableTg",
    port=80,
    protocol="HTTP",
    vpc_id=vpc_id, # Provide your VPC ID
)

canary_tg = aws.lb.TargetGroup("canaryTg",
    port=80,
    protocol="HTTP",
    vpc_id=vpc_id,
)

# Register instances with the target groups.
stable_attachment = aws.lb.TargetGroupAttachment("stableAttachment",
    target_group_arn=stable_tg.arn,
    target_id=stable_version.id,
)

canary_attachment = aws.lb.TargetGroupAttachment("canaryAttachment",
    target_group_arn=canary_tg.arn,
    target_id=canary_version.id,
)

# Create listener for the load balancer that forwards requests to the stable target group by default.
listener = aws.lb.Listener("listener",
    load_balancer_arn=alb.arn,
    port=80,
    default_actions=[{
        "type": "forward",
        "target_group_arn": stable_tg.arn,
    }],
)

# Add a rule to the listener to divert a percentage of traffic to the canary target group.
# This is typically done by path or headers, but here for simplicity we are using a dummy condition.
canary_rule = aws.lb.ListenerRule("canaryRule",
    listener_arn=listener.arn,
    actions=[{
        "type": "forward",
        "target_group_arn": canary_tg.arn,
    }],
    conditions=[{
        # The actual condition should match properties of HTTP requests, e.g., headers or path patterns.
        # For instance, you can route traffic with a header like 'X-Canary: true'
        "field": "http-header",
        "http_header_config": {
            "http_header_name": "X-Canary",
            "values": ["true"],
        },
    }],
    priority=100, # Lower numeric value takes precedence
)

# Export the DNS name of the load balancer to access the canary deployment.
pulumi.export('load_balancer_dns', alb.dns_name)
```

This program sets up the infrastructure required for a canary deployment:

- Two EC2 instances represent the stable and canary versions of your service.
- An Application Load Balancer (ALB) to manage traffic.
- A default listener rule to route most traffic to the stable target group.
- A canary rule to route a small amount of traffic to the canary target group.

Make sure to replace placeholders like `ami-XXXXX`, `sg.id`, `subnet_ids`, and `vpc_id` with actual values specific to your AWS setup.

This is a basic example aimed at illustrating how you can use Pulumi to manage AWS resources for a canary deployment. In a practical scenario, you would need to align the percentages, adjust health checks, set up monitoring and logging, and likely integrate with CI/CD pipelines to automate the promotion process for the canary instances.