Batch Processing Workloads for AI Inference with AWS ECS

Question

Pulumi · Accepted Answer

To implement batch processing workloads for AI inference with AWS ECS using Pulumi, you will typically need to set up the following resources:

1. **Amazon ECS Cluster**: This is the core of your containerized application, providing the infrastructure to manage your services and run your containers.
2. **Task Definition**: This specifies your application with one or more container definitions, volume definitions, and other parameters for a batch task.
3. **Compute Environment**: For AWS Batch, this defines the computing resources that your batch jobs will use.
4. **Job Definition**: This defines how batch jobs are to be run, such as the Docker image and command to use, resource requirements, and retry strategies.
5. **Job Queue**: This is where batch jobs are submitted. It prioritizes and dispatches jobs to run in the compute environment.
6. **Service**: With ECS, services allow you to run and maintain a specified number of instances of a task definition simultaneously.

Here’s a Pulumi Python program that sets up a simple batch processing workload for AI inference. The program includes:

- An ECS cluster where your tasks are placed.
- An IAM role for the ECS service to allow the ECS tasks to interact with other AWS services.
- A job role for the tasks to have the necessary permissions to execute.
- A container definition and task definition for running the AI inference workloads.
- A compute environment for AWS Batch to manage the computing resources.
- A job queue for submitting and prioritizing jobs to be processed by your compute environment.
- A job definition that describes the batch jobs to run.

Let's walk through the setup:

```python
import pulumi
import pulumi_aws as aws

# Create an ECS cluster
ecs_cluster = aws.ecs.Cluster("ai_batch_cluster")

# Assuming an existing IAM policy for ECS execution exists, get its ARN
# You should replace 'aws_iam_policy.ecs_policy.id' with your specific policy ARN
ecs_exec_policy_arn = 'arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy'

# Create an IAM role for the ECS task execution
ecs_exec_role = aws.iam.Role("ecs_exec_role",
                             assume_role_policy=aws.iam.get_policy_document(statements=[
                                 aws.iam.GetPolicyDocumentStatementArgs(
                                     actions=["sts:AssumeRole"],
                                     principals=[aws.iam.GetPolicyDocumentStatementPrincipalArgs(
                                         type="Service",
                                         identifiers=["ecs-tasks.amazonaws.com"])])]).json)

# Attach the ECS execution policy to the role
ecs_exec_role_policy_attachment = aws.iam.RolePolicyAttachment("ecs_exec_role_policy_attachment",
                                                              policy_arn=ecs_exec_policy_arn,
                                                              role=ecs_exec_role.name)

# Create a log group for the ECS tasks
log_group = aws.cloudwatch.LogGroup("ai_batch_log_group",
                                    retention_in_days=30)

# Task definition for the AI inference workload
task_definition = aws.ecs.TaskDefinition("ai_batch_task_definition",
                                         family="ai_batch_processing",
                                         cpu="256",  # 0.25 vCPU
                                         memory="1024",  # 1 GB
                                         network_mode="awsvpc",
                                         requires_compatibilities=["FARGATE"],
                                         execution_role_arn=ecs_exec_role.arn,
                                         container_definitions=pulumi.Output.all(log_group.name).apply(
                                             lambda name: f"""[
                {{
                    "name": "inference-container",
                    "image": "my_inference_image",  # Replace with your actual image path
                    "cpu": 256,
                    "memory": 1024,
                    "essential": true,
                    "logConfiguration": {{
                        "logDriver": "awslogs",
                        "options": {{
                            "awslogs-group": "{name}",
                            "awslogs-region": "us-west-2",  # Replace with your actual region
                            "awslogs-stream-prefix": "ecs"
                        }}
                    }}
                }}
            ]"""
                                         ))

# Define compute resources for AWS Batch
compute_environment = aws.batch.ComputeEnvironment("ai_batch_compute_environment",
                                                   service_role=ecs_exec_role.arn,
                                                   compute_resources={
                                                       "type": "FARGATE",
                                                       "minVcpus": 0,
                                                       "maxVcpus": 16,
                                                       "subnets": ["subnet-xxxxxx"],  # Replace with your actual subnets
                                                       "security_group_ids": ["sg-xxxxxx"],  # Replace with your actual security groups
                                                       "assignmentStatus": "ENABLED",
                                                   })

# Create a job queue for the batch jobs
job_queue = aws.batch.JobQueue("ai_batch_job_queue",
                               state="ENABLED",
                               compute_environments=[compute_environment.arn],
                               priority=1)

# Define the batch job definition
job_definition = aws.batch.JobDefinition("ai_batch_job_definition",
                                         type="container",
                                         platform_capabilities=["FARGATE"],
                                         container_properties=pulumi.Output.all(task_definition.arn).apply(
                                             lambda arn: f"""{{
                "image": "my_inference_image",  # Replace with your actual image path
                "vcpus": 1,
                "memory": 1024,
                "executionRoleArn": "{ecs_exec_role.arn}"
            }}"""
                                         ))

# Export the ARN of the ECS cluster
pulumi.export('ecs_cluster_arn', ecs_cluster.arn)
# Export the ARN of the job queue
pulumi.export('job_queue_arn', job_queue.arn)
```

In this program, replace placeholders like `my_inference_image`, `subnet-xxxxxx`, `sg-xxxxxx`, and region strings with actual values suitable for your setup.

Each resource in the program is connected - for example, the `task_definition` references the log group for logging and the execution role for permissions to pull the container image and log to CloudWatch. The ECS cluster is the backbone that runs your container jobs, and jobs are submitted to the job queue which uses compute resources to process the batches.

Before running it, ensure that you have Pulumi installed and configured with your AWS credentials. You can then run `pulumi up` from the directory containing this program to create the infrastructure. The output of the command will show you the resources being created, and you can confirm the changes to proceed.

Once the infrastructure is provisioned, you can start submitting batch jobs to the ECS cluster, which will be processed by the Fargate tasks defined in the task definition. The compute environment in AWS Batch will worry about scaling the compute resources up and down as needed to process the incoming jobs.