High-Throughput Batch Inference with ECS and Capacity Providers

Question

Pulumi · Accepted Answer

To set up high-throughput batch inference on AWS, we'll use Amazon ECS (Elastic Container Service) combined with Capacity Providers. Capacity Providers are a feature of ECS that manage the infrastructure scaling for you, based on the requirements of your application.

Here's what we're going to do:

1. **Define an ECS Cluster**: The ECS Cluster is the logical grouping of tasks or services.
2. **Create a Capacity Provider**: We need to specify the details of how we want our infrastructure to scale.
3. **Associate the Capacity Provider with the ECS Cluster**: This binds our scaling strategy to our cluster.
4. **Set up a Compute Environment for AWS Batch**: AWS Batch will leverage ECS to run our batch jobs.
5. **Create a Task Definition for Batch Jobs**: Defines the Docker image to use for the batch job and how the container is configured.
6. **Create a Batch Job Queue**: Holds the batch jobs that are ready to be run on the compute resources.

Let's break down the code.

1. **ECS Cluster**: This is where tasks and services are run. We define an ECS cluster with the necessary configuration.

2. **ECS Capacity Provider**: We configure the capacity provider with an AutoScaling Group. We set parameters for managed scaling to control how instances scale in and out.

3. **Cluster Capacity Provider Associations**: This associates the defined capacity providers with our ECS cluster.

4. **Compute Environment for Batch Jobs**: We set this up with a special type of ECS, optimized for running batch jobs.

5. **Task Definition**: This includes the inference job's container definitions, CPU and memory requirements, and other settings.

6. **Batch Job Queue**: We create a queue that receives tasks to process, and then AWS Batch pulls from this queue to process jobs as resources become available.

The following code sets up this infrastructure:

```python
import pulumi
import pulumi_aws as aws

# Create an ECS cluster to house our services and tasks
ecs_cluster = aws.ecs.Cluster("batch-inference-cluster")

# Define an ECS Capacity Provider
# This requires an existing Auto Scaling Group or creation of a new one
capacity_provider = aws.ecs.CapacityProvider("batch-inference-capacity-provider",
    auto_scaling_group_provider={
        "auto_scaling_group_arn": aws_autoscaling_group_arn,  # Replace with your AutoScaling group ARN
        "managed_scaling": {
            "status": "ENABLED",
            "target_capacity": 75,
        }
    }
)

# Associate the Capacity Provider with the ECS Cluster
cluster_capacity_provider_associations = aws.ecs.ClusterCapacityProviders(
    "batch-inference-capacity-providers",
    cluster_name=ecs_cluster.name,
    capacity_providers=[capacity_provider.name]
)

# Define the Compute Environment for AWS Batch
compute_env = aws.batch.ComputeEnvironment("batch-inference-compute-env",
    compute_resources={
        "type": "EC2",
        "min_vcpus": 0,
        "max_vcpus": 100,
        "instance_types": ["m4.large"],
        "subnets": aws_subnet_ids,  # Replace with your subnet IDs
    },
    service_role=aws_batch_service_role_arn  # Replace with your AWS Batch service role ARN
)

# Create the task definition for high-throughput batch jobs
task_definition = aws.ecs.TaskDefinition("batch-inference-task",
    family=f"{ecs_cluster.name}-task",
    cpu="256",  # Adjust as necessary
    memory="512",  # Adjust as necessary
    network_mode="awsvpc",
    requires_compatibilities=["FARGATE"],
    execution_role_arn=aws_execution_role_arn,  # Replace with your task execution role ARN
    container_definitions=inference_container_definition  # Replace with your JSON container definition
)

# Finally, create a Batch Job Queue
job_queue = aws.batch.JobQueue("batch-inference-job-queue",
    compute_environments=[compute_env.arn],
    priority=1,
    state="ENABLED"
)

# Export the ARNs of resources so we can interoperate with other AWS services or applications
pulumi.export("cluster_arn", ecs_cluster.arn)
pulumi.export("capacity_provider_arn", capacity_provider.arn)
pulumi.export("compute_env_arn", compute_env.arn)
pulumi.export("job_queue_arn", job_queue.arn)
```

In this code:

- `aws_autoscaling_group_arn` is the ARN of your AutoScaling group.
- `aws_subnet_ids` are the IDs of the subnets you want the EC2 instances within the AutoScaling group to launch.
- `aws_batch_service_role_arn` is the ARN of the role that AWS Batch will assume.
- `aws_execution_role_arn` is the ARN for the execution role that the ECS tasks will assume.
- `inference_container_definition` is the container definition in JSON format that defines the Docker container to be used in the task.

Make sure to replace the placeholders (`aws_autoscaling_group_arn`, `aws_subnet_ids`, etc.) with your actual resource information. The container definition for the inference job will need to specify the image to use, along with any environment variables, volumes, and other configurations your application requires.

This setup provides a foundation for high-throughput batch processing using AWS ECS and Capacity Providers, giving you the ability to scale efficiently based on workload demands.