Highly Available Model Serving with AWS ALB.

Question

Pulumi · Accepted Answer

To achieve a highly available model serving infrastructure using AWS Application Load Balancer (ALB), we can create a Pulumi program that sets up the ALB along with the necessary supporting resources. The core components will include the ALB, Target Groups, and Listeners.

1. **AWS ALB (Application Load Balancer)** - An ALB efficiently distributes incoming application traffic across multiple targets, such as EC2 instances, in multiple Availability Zones, increasing the fault tolerance of your applications.

2. **Target Groups** - Target groups are used to route requests to one or more registered targets. When creating a listener rule, you specify a target group for its default action. Health checks are configured at the target group level.

3. **Listeners** - A listener checks for connection requests from clients, using the protocol and port that you configure. The rules that you define for a listener determine how the load balancer routes requests to the targets in one or more target groups.

Below is a Pulumi program written in Python to create a simple AWS ALB, along with the necessary listeners and a target group. The program assumes you have EC2 instances running and ready to serve traffic. If you don't, you'll need to create them and configure their security groups to allow inbound traffic on the serving port.

Here's a step-by-step guide on what the code accomplishes:

1. **Creates an ALB** - The ALB is created within specified subnets. We enable access logs for auditing purposes.
2. **Defines a Target Group** - A target group is created, which will include the EC2 instances that need to serve traffic.
3. **Sets up a Listener** - A listener is set up to listen on port 80 (HTTP), which forwards requests to the previously defined target group.

```python
import pulumi
import pulumi_aws as aws

# Create a new security group for the ALB to allow HTTP traffic
alb_security_group = aws.ec2.SecurityGroup('albSecurityGroup',
    description='Allow HTTP inbound traffic',
    ingress=[
        {
            'from_port': 80,
            'to_port': 80,
            'protocol': 'tcp',
            'cidr_blocks': ['0.0.0.0/0'],
        },
    ],
    egress=[
        {
            'from_port': 0,
            'to_port': 0,
            'protocol': '-1',  # Allow all outbound traffic
            'cidr_blocks': ['0.0.0.0/0'],
        },
    ])

# Provision a new ALB in the default VPC
alb = aws.lb.LoadBalancer('app-lb',
    security_groups=[alb_security_group.id],
    subnets=aws.get_subnet_ids().ids,  # Retrieves your default VPC subnets
    load_balancer_type='application')

# Define the port and protocol for a new target group.
# Here, we're assuming your servers are serving on port 8080 via HTTP
target_group = aws.lb.TargetGroup('app-tg',
    port=8080,
    protocol='HTTP',
    target_type='instance',
    vpc_id=aws.get_vpc().then(lambda vpc: vpc.id))  # Retrieves your default VPC

# Register several EC2 instances (replace the IDs with your actual instance IDs)
# to the target group.
target_group_attachment = aws.lb.TargetGroupAttachment('app-tg-attachment',
    target_group_arn=target_group.arn,
    target_id='i-xxxxxxxxxxxxxxxxx',  # Replace with your EC2 instance ID
    port=8080)

# Create a listener for the ALB to listen to HTTP requests
listener = aws.lb.Listener('app-listener',
    load_balancer_arn=alb.arn,
    port=80,
    default_actions=[{
        'type': 'forward',
        'target_group_arn': target_group.arn,
    }])

# Export the DNS name and Hostname of the ALB
pulumi.export('alb_dns_name', alb.dns_name)
pulumi.export('alb_hostname', alb.hostname)
```

This program sets up a basic load balancer configuration. In practice, you may want to use HTTPS for secure communications, which requires setting up an SSL certificate and modifying the listener configuration to use port 443.

Each of the resources used in the program is explained through comments, providing context on its role in the infrastructure setup. With this setup, you have a highly available model serving infrastructure that you can further customize to your specific requirements.