Multiple Model Endpoint Routing with AWS Load Balancer Target Groups

Question

Pulumi · Accepted Answer

To facilitate multiple model endpoint routing using AWS Load Balancer Target Groups, we will set up resources to direct traffic to different endpoints based on specified routing rules. Here's how this can be achieved using Pulumi and AWS:

Load Balancer (ALB): The Application Load Balancer will receive incoming traffic and distribute it across multiple target groups based on the defined listener rules.
Target Groups: Each model endpoint will be associated with its own target group. A target group allows you to route requests to one or more registered targets, such as EC2 instances, containers, or IP addresses.
Listener Rules: We will define rules on the load balancer's listener to check for conditions such as path patterns or host headers and forward the requests to the corresponding target group based on these conditions.

Here's a Pulumi program in Python that shows how you can define these resources. For simplicity, we assume that the EC2 instances or containers serving the model endpoints are already running.

import pulumi
import pulumi_aws as aws

# Create a new VPC if not using the default one.
vpc = aws.ec2.Vpc("vpc")

# Create an Internet-facing Application Load Balancer.
load_balancer = aws.lb.LoadBalancer("loadBalancer",
    subnets=[],  # Specify the subnets for the Load Balancer.
    load_balancer_type="application",
    security_groups=[],  # Attach appropriate security groups.
    enable_http2=True,
)

# Create a target group for Model A Endpoint.
target_group_model_a = aws.lb.TargetGroup("targetGroupModelA",
    port=80,
    protocol="HTTP",
    vpc_id=vpc.id,
    health_check={
        "path": "/health",  # Path for the health check endpoint.
        "interval": 30,
    },
)

# Create a target group for Model B Endpoint.
target_group_model_b = aws.lb.TargetGroup("targetGroupModelB",
    port=80,
    protocol="HTTP",
    vpc_id=vpc.id,
    health_check={
        "path": "/health",
        "interval": 30,
    },
)

# Assume you have a listener set up on the load balancer.
listener = aws.lb.Listener("listener",
    load_balancer_arn=load_balancer.arn,
    port=80,
    protocol="HTTP",
)

# Create a listener rule to route traffic to Model A Endpoint based on the path.
listener_rule_model_a = aws.lb.ListenerRule("listenerRuleModelA",
    actions=[{
        "type": "forward",
        "target_group_arn": target_group_model_a.arn,
    }],
    conditions=[{
        "path_pattern": {
            "values": ["/model-a*"],
        },
    }],
    listener_arn=listener.arn,
    priority=10,
)

# Create a listener rule to route traffic to Model B Endpoint based on the path.
listener_rule_model_b = aws.lb.ListenerRule("listenerRuleModelB",
    actions=[{
        "type": "forward",
        "target_group_arn": target_group_model_b.arn,
    }],
    conditions=[{
        "path_pattern": {
            "values": ["/model-b*"],
        },
    }],
    listener_arn=listener.arn,
    priority=20,
)

# Export the Load Balancer DNS name to access the model endpoints.
pulumi.export("load_balancer_dns_name", load_balancer.dns_name)

In this program:

A LoadBalancer resource is created to handle incoming requests. We specify that it's an application load balancer with load_balancer_type="application", and it is internet-facing by default.
Two TargetGroup resources represent groups for each of the model's endpoints. You need to provide the protocol, port, vpc_id, and health_check configurations.
A Listener is attached to the load balancer to listen to the HTTP traffic on port 80.
Two ListenerRule resources are defined, one for Model A with path_pattern: "/model-a*" and one for Model B with path_pattern: "/model-b*". These rules forward traffic to the respective target groups based on the URL path requested.

Please replace placeholders for subnets and security_groups with actual values from your setup, and modify the path in the health_check configuration to point to the correct path for health checking each of the endpoints.

The priority value in the listener rules is used to determine the order of rule evaluation when requests are received; lower numbers have higher precedence.

Finally, we export the DNS name of the load balancer, which you can use to access your model endpoints from the internet.

This setup will route traffic to http://<load_balancer_dns_name>/model-a to the Model A endpoint and traffic to http://<load_balancer_dns_name>/model-b to the Model B endpoint. To view the Load Balancer DNS name after deployment, you can run pulumi stack output load_balancer_dns_name.

Ensure that the target EC2 instances or containers are properly registered with their respective target groups and are configured to respond to the health check path you've specified.