Auto-Scaling AI Workloads with ECS Task Definitions

Question

Pulumi · Accepted Answer

Auto-scaling your AI workloads in AWS ECS (Amazon Web Services Elastic Container Service) involves creating an ECS Cluster, defining Task Definitions, and using Services that manage the running instances of your containerized applications. To automatically scale the tasks up or down based on the demand, you will need to configure Auto Scaling Policies. Here's the overall flow:

1. **ECS Cluster**: The ECS cluster is a logical grouping of tasks or services. You'll need one cluster to place your tasks into.

2. **Task Definition**: This is akin to a blueprint for your application. It defines things like which container image to use, the required CPU and memory, the necessary environment variables, and other settings.

3. **Service**: The ECS Service keeps a specified number of instances of the task definition running, replacing them if they fail.

4. **Auto Scaling**: Auto Scaling automatically adjusts the number of running tasks in response to the workload. This is done by setting up an Auto Scaling Policy which defines how the service should scale in response to changes in demand.

5. **Auto Scaling Policy**: The Policy will define how to scale (e.g., increase/decrease the number of tasks) based on the specified metrics and thresholds.

6. **CloudWatch Alarms**: To trigger the Auto Scaling Policy, you might configure CloudWatch alarms, which will alert the Auto Scaling Service when, for example, the CPU usage of the tasks goes above a certain threshold.

For the purpose of this example, let's create these components using Pulumi in Python.

```python
import pulumi
import pulumi_aws as aws

# Create an ECS cluster to hold your services.
cluster = aws.ecs.Cluster("ai_cluster")

# Create an IAM role to execute ECS tasks.
task_exec_role = aws.iam.Role("task_exec_role", assume_role_policy="""{
  "Version": "2012-10-17",
  "Statement": [{
    "Action": "sts:AssumeRole",
    "Principal": {
      "Service": "ecs-tasks.amazonaws.com"
    },
    "Effect": "Allow",
    "Sid": ""
  }]
}""")

# Attach the AWS managed policy to allow the ECS tasks to pull images and store logs.
task_exec_policy_attachment = aws.iam.RolePolicyAttachment("task_exec_policy_attachment",
    policy_arn="arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy",
    role=task_exec_role.name)

# Define a Task definition for your AI workload.
task_definition = aws.ecs.TaskDefinition("ai_task_definition",
    family="ai_workload",
    cpu="256",
    memory="512",
    network_mode="awsvpc",
    requires_compatibilities=["FARGATE"],
    execution_role_arn=task_exec_role.arn,
    container_definitions=pulumi.Output.all().apply(lambda args: f"""[
    {{
      "name": "ai_service",
      "image": "my-ai-image",
      "cpu": 256,
      "memory": 512,
      "essential": true,
      "portMappings": [
        {{
          "containerPort": 80,
          "hostPort": 80
        }}
      ]
    }}
    ]""")
)

# Create an ECS Service to manage the tasks.
# This service will ensure that we always have the desired number of task instances running.
service = aws.ecs.Service("ai_service",
    cluster=cluster.arn,
    task_definition=task_definition.arn,
    desired_count=2,
    launch_type="FARGATE",
    network_configuration=aws.ecs.ServiceNetworkConfigurationArgs(
        assign_public_ip=True,
        subnets=["subnet-xxx"], # replace with your subnets
        security_groups=["sg-xxx"] # replace with your security groups
    ),
    load_balancers=[aws.ecs.ServiceLoadBalancerArgs(
        target_group_arn="arn:aws:elasticloadbalancing:region:12345:targetgroup/my-targets/123",
        container_name="ai_service",
        container_port=80,
    )],
    opts=pulumi.ResourceOptions(depends_on=[task_exec_policy_attachment])
)

# Create Auto Scaling Policy
scaling_target = aws.appautoscaling.Target("scaling_target",
    max_capacity=4,
    min_capacity=2,
    resource_id=pulumi.Output.concat("service/", cluster.name, "/", service.name),
    scalable_dimension="ecs:service:DesiredCount",
    service_namespace="ecs",
)

# Define how to scale: for example, at 75% average CPU utilization.
scaling_policy = aws.appautoscaling.Policy("scaling_policy",
    policy_type="TargetTrackingScaling",
    resource_id=scaling_target.resource_id,
    scalable_dimension=scaling_target.scalable_dimension,
    service_namespace=scaling_target.service_namespace,
    target_tracking_scaling_policy_configuration=aws.appautoscaling.PolicyTargetTrackingScalingPolicyConfigurationArgs(
        target_value=75.0,
        predefined_metric_specification=aws.appautoscaling.PolicyPredefinedMetricSpecificationArgs(
            predefined_metric_type="ECSServiceAverageCPUUtilization",
        ),
    ),
    depends_on=[scaling_target]
)

# Export the URL of the service's load balancer.
pulumi.export("service_url", pulumi.Output.concat("http://", aws.lb.get_listener(arn="arn:aws:elasticloadbalancing:region:12345:listener/app/my-load-balancer/123/handle").dns_name))
```

This program sets up an ECS cluster to host your services and defines a basic task definition that describes your AI workload. It then creates an ECS service to ensure a set number of these tasks are always running. The Auto Scaling Group is set to scale the number of tasks between 2 and 4 based on the average CPU utilization.

Please make sure to replace `my-ai-image` with the Docker image you want to deploy, `subnet-xxx` and `sg-xxx` with your actual subnet and security group IDs, and the load balancer ARNs with your specific values.

If your AI workload necessitates it, modify the CPU and memory specifications within the `task_definition` and `scaling_policy` to better suit your application's needs.

Remember to add the services' desired endpoint to your load balancer's target group so that the traffic can be routed correctly to your services.