Running Large Language Models on ECS with Spot Capacity Providers

Question

Pulumi · Accepted Answer

Running large language models, such as GPT-3 or similar, requires substantial computing resources which can be expensive if provisioned as on-demand instances. Amazon ECS (Elastic Container Service) allows you to run containerized applications, and can utilize Spot Instances as a cost-effective solution. Spot Instances are unused EC2 instances that AWS offers at a significant discount compared to the on-demand price.

Amazon ECS allows you to define a Capacity Provider strategy that can include Spot Instances. By using Spot Capacity Providers, ECS can provision Spot Instances to run your containerized large language model workloads, reducing the cost while maintaining the desired scale and performance.

Below, I will guide you through a Pulumi program written in Python that sets up an ECS Cluster with a Spot Capacity Provider. The program entails the following resources:

- ECS Cluster: A logical grouping of tasks or services within ECS.
- Spot Capacity Provider: An entity that allows ECS to use Spot Instances as part of the cluster capacity.
- ECS Task Definition: A blueprint for your application that specifies the container definitions and requirements.
- ECS Service: Supervises the long-running instances of the defined Task Definition.

We will create these resources using Pulumi's AWS provider.

```python
import pulumi
import pulumi_aws as aws

# Create an ECS cluster
ecs_cluster = aws.ecs.Cluster("ecs-cluster")

# Define an EC2 Spot Capacity Provider
capacity_provider = aws.ecs.CapacityProvider("spot-capacity-provider",
    auto_scaling_group_provider=aws.ecs.CapacityProviderAutoScalingGroupProviderArgs(
        auto_scaling_group_arn=autoscaling_group.arn,
        managed_scaling=aws.ecs.CapacityProviderAutoScalingGroupProviderManagedScalingArgs(
            status="ENABLED",
            target_capacity=70, # Target 70% spot instance usage
        ),
        managed_termination_protection="ENABLED"
    ),
    name="MySpotCapacityProvider"
)

# Ensure that capacity provider is associated with the ECS cluster
ecs_cluster_capacity_providers = aws.ecs.ClusterCapacityProviders("ecs-cluster-capacity-providers",
    capacity_providers=[capacity_provider.name],
    cluster_name=ecs_cluster.name,
    default_capacity_provider_strategy=[aws.ecs.ClusterDefaultCapacityProviderStrategyArgs(
        capacity_provider=capacity_provider.name,
        weight=1,
    )]
)

# Create a new EC2 Task Definition for the language model
task_definition = aws.ecs.TaskDefinition("app-task-def",
    family="app-task",
    cpu="1024",
    memory="2048",
    network_mode="awsvpc",
    requires_compatibilities=["FARGATE"],
    execution_role_arn=execution_role.arn,
    container_definitions=pulumi.Output.all(model_container_image).apply(lambda image: f"""
        [
            {{
                "name": "app-container",
                "image": "{image}",
                "cpu": 512,
                "memory": 1024,
                "essential": true,
                "portMappings": [
                    {{
                        "containerPort": 8080,
                        "hostPort": 8080
                    }}
                ],
                "environment" : [
                    {{
                        "name" : "MODEL_SIZE",
                        "value" : "large"
                    }}
                ]
            }}
        ]
    """)
)

# Create an ECS Service to run and maintain the desired number of instances of the language model
ecs_service = aws.ecs.Service("app-service",
    cluster=ecs_cluster.name,
    task_definition=task_definition.arn,
    launch_type="FARGATE",
    desired_count=3,
    network_configuration=aws.ecs.ServiceNetworkConfigurationArgs(
        subnets=subnet_ids, # Subnets should be configured according to your VPC setup
        security_groups=[security_group.id], # The security group assigned to the task or service
        # Assign public IP if you're using a public subnet; otherwise set it to "false"
        assign_public_ip="ENABLED" if public_subnet else "DISABLED",
    )
)

# Export the service name so it can be easily accessed, for example in CI/CD scripts or other automation
pulumi.export('service_name', ecs_service.name)
```

In this program:

- We start by creating an ECS Cluster to organize our services.
- Next, a Spot Capacity Provider is defined with a managed scaling policy aiming for 70% utilization, along with managed termination protection to reduce the likelihood of Spot Instances being interrupted.
- We then ensure that our ECS Cluster is tied to the defined Spot Capacity Provider.
- We create an ECS Task Definition, specifying the container definitions including CPU, memory, and environment variables suited for our language model.
- An ECS Service is instantiated to maintain the desired count of tasks running our language model, utilizing AWS Fargate for a serverless experience.
- Finally, we export the service's name as a stack output for ease of reference.

This program assumes that you have an appropriate container image for the language model and an Auto Scaling Group defined; replace `model_container_image` and `autoscaling_group.arn` with your actual image and Auto Scaling Group ARN. Set `subnet_ids` and `security_group.id` according to your VPC and security configurations.

Provide Pulumi with AWS credentials to manage infrastructure within your account, and execute this Pulumi program in a Python environment where Pulumi CLI is installed. Upon `pulumi up`, Pulumi will create this infrastructure on AWS.