Running Large Language Models on ECS with Spot Capacity Providers
PythonRunning large language models, such as GPT-3 or similar, requires substantial computing resources which can be expensive if provisioned as on-demand instances. Amazon ECS (Elastic Container Service) allows you to run containerized applications, and can utilize Spot Instances as a cost-effective solution. Spot Instances are unused EC2 instances that AWS offers at a significant discount compared to the on-demand price.
Amazon ECS allows you to define a Capacity Provider strategy that can include Spot Instances. By using Spot Capacity Providers, ECS can provision Spot Instances to run your containerized large language model workloads, reducing the cost while maintaining the desired scale and performance.
Below, I will guide you through a Pulumi program written in Python that sets up an ECS Cluster with a Spot Capacity Provider. The program entails the following resources:
- ECS Cluster: A logical grouping of tasks or services within ECS.
- Spot Capacity Provider: An entity that allows ECS to use Spot Instances as part of the cluster capacity.
- ECS Task Definition: A blueprint for your application that specifies the container definitions and requirements.
- ECS Service: Supervises the long-running instances of the defined Task Definition.
We will create these resources using Pulumi's AWS provider.
import pulumi import pulumi_aws as aws # Create an ECS cluster ecs_cluster = aws.ecs.Cluster("ecs-cluster") # Define an EC2 Spot Capacity Provider capacity_provider = aws.ecs.CapacityProvider("spot-capacity-provider", auto_scaling_group_provider=aws.ecs.CapacityProviderAutoScalingGroupProviderArgs( auto_scaling_group_arn=autoscaling_group.arn, managed_scaling=aws.ecs.CapacityProviderAutoScalingGroupProviderManagedScalingArgs( status="ENABLED", target_capacity=70, # Target 70% spot instance usage ), managed_termination_protection="ENABLED" ), name="MySpotCapacityProvider" ) # Ensure that capacity provider is associated with the ECS cluster ecs_cluster_capacity_providers = aws.ecs.ClusterCapacityProviders("ecs-cluster-capacity-providers", capacity_providers=[capacity_provider.name], cluster_name=ecs_cluster.name, default_capacity_provider_strategy=[aws.ecs.ClusterDefaultCapacityProviderStrategyArgs( capacity_provider=capacity_provider.name, weight=1, )] ) # Create a new EC2 Task Definition for the language model task_definition = aws.ecs.TaskDefinition("app-task-def", family="app-task", cpu="1024", memory="2048", network_mode="awsvpc", requires_compatibilities=["FARGATE"], execution_role_arn=execution_role.arn, container_definitions=pulumi.Output.all(model_container_image).apply(lambda image: f""" [ {{ "name": "app-container", "image": "{image}", "cpu": 512, "memory": 1024, "essential": true, "portMappings": [ {{ "containerPort": 8080, "hostPort": 8080 }} ], "environment" : [ {{ "name" : "MODEL_SIZE", "value" : "large" }} ] }} ] """) ) # Create an ECS Service to run and maintain the desired number of instances of the language model ecs_service = aws.ecs.Service("app-service", cluster=ecs_cluster.name, task_definition=task_definition.arn, launch_type="FARGATE", desired_count=3, network_configuration=aws.ecs.ServiceNetworkConfigurationArgs( subnets=subnet_ids, # Subnets should be configured according to your VPC setup security_groups=[security_group.id], # The security group assigned to the task or service # Assign public IP if you're using a public subnet; otherwise set it to "false" assign_public_ip="ENABLED" if public_subnet else "DISABLED", ) ) # Export the service name so it can be easily accessed, for example in CI/CD scripts or other automation pulumi.export('service_name', ecs_service.name)
In this program:
- We start by creating an ECS Cluster to organize our services.
- Next, a Spot Capacity Provider is defined with a managed scaling policy aiming for 70% utilization, along with managed termination protection to reduce the likelihood of Spot Instances being interrupted.
- We then ensure that our ECS Cluster is tied to the defined Spot Capacity Provider.
- We create an ECS Task Definition, specifying the container definitions including CPU, memory, and environment variables suited for our language model.
- An ECS Service is instantiated to maintain the desired count of tasks running our language model, utilizing AWS Fargate for a serverless experience.
- Finally, we export the service's name as a stack output for ease of reference.
This program assumes that you have an appropriate container image for the language model and an Auto Scaling Group defined; replace
model_container_image
andautoscaling_group.arn
with your actual image and Auto Scaling Group ARN. Setsubnet_ids
andsecurity_group.id
according to your VPC and security configurations.Provide Pulumi with AWS credentials to manage infrastructure within your account, and execute this Pulumi program in a Python environment where Pulumi CLI is installed. Upon
pulumi up
, Pulumi will create this infrastructure on AWS.