Scalable Inference Service with AWS Auto Scaling Lifecycle Hooks

Question

Pulumi · Accepted Answer

Auto Scaling is a crucial component in a scalable cloud infrastructure as it helps to adjust the number of compute instances automatically according to the demand. When building a scalable inference service on AWS, you likely want to use Auto Scaling Groups (ASGs) that can grow or shrink based on metrics like CPU utilization or the number of requests your inference service is handling. Lifecycle Hooks are used to control the state of instances as they launch and terminate, allowing you to perform custom actions, like warming up an instance before it starts serving traffic or draining connections before it's terminated.

To set up an Auto Scaling lifecycle hook, you need to create an Auto Scaling Group that defines the parameters for when to scale in and scale out. You'll also need to specify the lifecycle hook to trigger certain actions at specific points in the Auto Scaling process.

Below is a Pulumi program written in Python that creates an Auto Scaling Group with Lifecycle Hooks for a scalable AWS inference service. The program will perform the following actions:

Define the launch configuration for the instances that will be part of the ASG. This includes the AMI, instance type, and other configurations.
Create an Auto Scaling Group using this launch configuration, along with details like the desired, minimum, and maximum number of instances.
Attach a Lifecycle Hook to the ASG that will trigger either when instances are launching or terminating.

Let's go through how to set this up.

import pulumi
import pulumi_aws as aws

# Define the launch configuration for the instances to be used in the Auto Scaling Group.
launch_config = aws.ec2.LaunchConfiguration(
    "app-launch-config",
    image_id="ami-0c55b159cbfafe1f0",  # Replace with the AMI ID of your choice
    instance_type="t2.medium"  # The instance type to use for your inference service
)

# Create an Auto Scaling Group that refers to the above launch configuration.
asg = aws.autoscaling.Group(
    "app-auto-scaling-group",
    launch_configuration=launch_config.id,
    desired_capacity=2,  # Start with 2 instances
    min_size=1,         # Minimum number of instances to scale down
    max_size=5,         # Maximum number of instances to scale up
    vpc_zone_identifiers=[  # Subnet IDs
        "subnet-123456",  # Replace with your actual subnet IDs
        "subnet-456789",
    ],
    tags=[
        {
            "key": "Name",
            "value": "inference-instance",
            "propagate_at_launch": True,
        }
    ],
)

# Add a Lifecycle Hook to the Auto Scaling Group for instance launching.
lifecycle_hook = aws.autoscaling.LifecycleHook(
    "app-lifecycle-hook",
    autoscaling_group_name=asg.name,
    lifecycle_transition="autoscaling:EC2_INSTANCE_LAUNCHING",
    notification_target_arn="<SNS_TOPIC_ARN>",  # Replace with your SNS topic ARN
    role_arn="<IAM_ROLE_ARN>",                 # Replace with an IAM role ARN with SNS publish permissions
    default_result="CONTINUE",
    heartbeat_timeout=120,  # Time limit to complete the lifecycle action
    notification_metadata="additional information you want to send with the notification, such as instance ID",
)

# Export the ASG name and ID so you can easily reference them if needed.
pulumi.export('asg_name', asg.name)
pulumi.export('asg_id', asg.id)

In the above program, make sure to replace ami-0c55b159cbfafe1f0 with the actual AMI ID that you want your instances to use. Choose an instance type that meets the requirements of your inference service. The subnet IDs subnet-123456 and subnet-456789 should also be replaced with the subnets in your VPC that you want the Auto Scaling Group to use.

The Lifecycle Hook is set up to notify an SNS topic when an instance is launching; you should replace <SNS_TOPIC_ARN> with the ARN of your SNS topic and <IAM_ROLE_ARN> with the ARN of an IAM role that has permissions to publish to the specified SNS topic. The default_result parameter determines what action the group should take when the lifecycle hook timeout period ends. Here, we are continuing with the instance launch (CONTINUE), but this could be set to ABANDON if you want to stop the instance from launching if the Lifecycle Hook action isn't completed in time.

Finally, the outputs "asg_name" and "asg_id" will show the Auto Scaling Group's name and ID, which can be used in other Pulumi stacks or for reference purposes.

This program will provision the resources when you run it with Pulumi, following Pulumi's standard deployment process via the CLI or an automation API.