High-Availability Tensorflow Serving with ELB

Question

Pulumi · Accepted Answer

To create a high-availability TensorFlow Serving setup with an Elastic Load Balancer (ELB) using Pulumi, you'd generally follow these steps:

1. **Set up an Auto Scaling Group**: This allows you to have a cluster of EC2 instances that can grow or shrink based on demand, ensuring high availability and fault tolerance for your TensorFlow Serving application.
2. **Create an Elastic Load Balancer**: The ELB will automatically distribute incoming application traffic across the EC2 instances in the Auto Scaling Group.
3. **Configure Health Checks**: Proper health checks will ensure that the ELB can detect if an instance becomes unhealthy and stop sending traffic to it, rerouting traffic to healthy instances instead.
4. **Set up Security Groups**: Security groups act as virtual firewalls for your EC2 instances, controlling both inbound and outbound traffic at the instance level.

Below is a Pulumi program written in Python that sets up this architecture on AWS. The program provisions an Auto Scaling Group of EC2 instances serving TensorFlow, and places an ELB in front of it to manage the incoming traffic and ensure high availability. This program assumes you have an AMI with TensorFlow Serving already installed.

```python
import pulumi
import pulumi_aws as aws

# Create a Security Group for the EC2 instances that will serve TensorFlow
tf_serving_security_group = aws.ec2.SecurityGroup('tf-serving-sg',
    description='Allow TensorFlow Serving traffic',
    ingress=[
        # Assuming TensorFlow Serving is on port 8500
        {'protocol': 'tcp', 'from_port': 8500, 'to_port': 8500, 'cidr_blocks': ['0.0.0.0/0']}
    ])

# Create an Auto Scaling Group configuration
# This launch configuration uses an existing AMI with TensorFlow Serving pre-installed
tf_serving_launch_configuration = aws.ec2.LaunchConfiguration('tf-serving-launch-config',
    image_id='ami-12345',  # Replace this with your TensorFlow Serving AMI ID
    instance_type='t2.medium',  # Choose an instance type appropriate for your needs
    security_groups=[tf_serving_security_group.id])

# Create an Auto Scaling Group
tf_serving_auto_scaling_group = aws.autoscaling.Group('tf-serving-asg',
    desired_capacity=2,
    min_size=1,
    max_size=3,
    launch_configuration=tf_serving_launch_configuration.id,
    vpc_zone_identifiers=[
        'subnet-12345',  # Replace with your VPC subnets
        'subnet-67890',
    ])

# Create an Elastic Load Balancer
tf_serving_load_balancer = aws.elb.LoadBalancer('tf-serving-elb',
    security_groups=[tf_serving_security_group.id],
    subnets=[
        'subnet-12345',  # Replace with your ELB subnets
        'subnet-67890',
    ],
    listeners=[
        {'instance_port': 8500, 'instance_protocol': 'http', 'lb_port': 80, 'lb_protocol': 'http'},
    ],
    health_check={
        'healthy_threshold': 2,
        'unhealthy_threshold': 2,
        'timeout': 3,
        'target': 'HTTP:8500/',
        'interval': 30,
    })

# Associate the ASG with the ELB
tf_serving_asg_attachment = aws.autoscaling.Attachment('tf-serving-asg-attachment',
    autoscaling_group_name=tf_serving_auto_scaling_group.id,
    elb=tf_serving_load_balancer.id)

# Export the DNS name of the ELB to access the TensorFlow Serving application
pulumi.export('tf_serving_elb_dns', tf_serving_load_balancer.dns_name)
```

In this program:

- We use the `aws.ec2.SecurityGroup` to create a security group that allows traffic on the TensorFlow Serving port, which is typically 8500.
- An `aws.ec2.LaunchConfiguration` resource is created specifying the AMI and the type of EC2 instance.
- `aws.autoscaling.Group` represents our Auto Scaling Group, which is configured to keep between 1 to 3 instances of our serving application based on demand.
- The `aws.elb.LoadBalancer` creates a new Elastic Load Balancer to distribute traffic across instances within the Auto Scaling Group.
- `aws.autoscaling.Attachment` associates the Auto Scaling Group with the Elastic Load Balancer.
- A `pulumi.export` statement is used at the end to output the DNS name of the Load Balancer, through which you can access the TensorFlow Serving endpoint.

Before running this Pulumi program, replace the placeholders (like the AMI ID and subnet IDs) with actual values from your AWS account setup. The AMI should have TensorFlow Serving installed and configured to start on boot. Make sure to configure Pulumi to connect to your AWS account, and choose an appropriate region for deploying your resources.