EC2 Auto Scaling for Variable AI Workloads

Question

Pulumi · Accepted Answer

Auto Scaling in Amazon EC2 is a feature that allows you to automatically adjust the number of EC2 instances in your application to meet the current demand without requiring constant manual intervention. This is particularly useful for AI workloads, which can be variable and computationally demanding.

To set up EC2 Auto Scaling, you typically need the following components:

1. **Launch Configuration/Template**: This specifies the instance type to use, the AMI, and other configurations like key pairs, security groups, and block device mappings. It serves as a blueprint for creating new instances.
2. **Auto Scaling Group**: This group defines the collection of EC2 instances that you want to scale. You specify the minimum and maximum number of instances, desired capacity, and the subnets where instances can be launched.
3. **Scaling Policies**: These are rules that determine when to launch or terminate instances based on various criteria like CPU utilization, network input/output, custom metrics, etc.

Below is a program written in Python using Pulumi's AWS SDK, which demonstrates how to set up a basic Auto Scaling configuration suitable for variable AI workloads. The program does the following:

- Defines a launch configuration for instances.
- Creates an Auto Scaling Group with desired, min, and max capacities.
- Creates a scaling policy to increase the number of instances if the average CPU utilization goes above a certain threshold.

```python
import pulumi
import pulumi_aws as aws

# Define the AMI to use: here, for example, we use the latest Amazon Linux 2 AMI
ami = aws.ec2.get_ami(most_recent=True, owners=["amazon"],
                      filters=[{"name":"name", "values":["amzn2-ami-hvm-*-x86_64-gp2"]}])

# Create a Launch Configuration for the Auto Scaling Group
launch_config = aws.autoscaling.LaunchConfiguration("ai-workload-lc",
                                                    image_id=ami.id,
                                                    instance_type="t3.micro", # You may choose an instance type suitable for your workload
                                                    key_name="myKeyPair") # Replace with your key pair name

# Create an Auto Scaling Group
scaling_group = aws.autoscaling.Group("ai-workload-asg",
                                      desired_capacity=1,
                                      max_size=3, # Maximum number of instances during scaling
                                      min_size=1, # Minimum number of instances during scaling
                                      health_check_type="EC2",
                                      health_check_grace_period=300,
                                      force_delete=True,
                                      launch_configuration=launch_config.id,
                                      vpc_zone_identifiers=["subnet-abcde012", "subnet-bcde012a"]) # Replace with your subnet IDs

# Define a Scaling Policy
scaling_policy = aws.autoscaling.Policy("ai-workload-scaling-policy",
                                        autoscaling_group_name=scaling_group.name,
                                        adjustment_type="ChangeInCapacity",
                                        scaling_adjustment=1, # Add 1 instance
                                        cooldown=300) # Cooldown period before further scaling actions can be taken

# Attach a CloudWatch Alarm to the Scaling Policy
cpu_alarm_high = aws.cloudwatch.MetricAlarm("ai-workload-cpu-alarm-high",
                                            comparison_operator="GreaterThanThreshold",
                                            evaluation_periods=2,
                                            metric_name="CPUUtilization",
                                            namespace="AWS/EC2",
                                            period=120,
                                            statistic="Average",
                                            threshold=50, # Set to the CPU utilization threshold
                                            alarm_actions=[scaling_policy.id],
                                            dimensions={"AutoScalingGroupName": scaling_group.name})

# Output the Auto Scaling Group Name
pulumi.export("scaling_group_name", scaling_group.name)
```

In this example:

- We start by finding an AMI to launch our instances.
- A launch configuration is defined using the found AMI and an instance type of `t3.micro`, which is suitable for small workloads. You should adjust the instance type according to your AI workload needs.
- Then, we create an Auto Scaling Group with a desired capacity of 1, to start. Based on demand, it can scale up to 3 instances.
- We create a scaling policy that is triggered when the CPU utilization goes above 50%. For real-world AI workloads, you might have specific triggers based on the workload characteristics.
- We attach a CloudWatch alarm that watches the CPU utilization metric. If it crosses the threshold of 50% over two consecutive periods of 120 seconds each, it will trigger the scaling policy.

Ensure you have `pulumi_aws` package installed in your Python environment and configure your AWS credentials for Pulumi CLI to run this program effectively.

Please replace placeholders such as key names and subnet IDs with actual values from your AWS environment. Additional configurations can be added to customize the Auto Scaling parameters to your specific AI workload requirements.