Scalable Machine Learning Model Serving with AWS ECS

Question

Pulumi · Accepted Answer

To create a scalable machine learning model serving infrastructure with AWS ECS (Elastic Container Service), you would generally follow these steps:

Set Up an ECS Cluster: This is the core set of compute resources used to run your containers. ECS allows you to place containers based on your requirements for availability, security, and scalability.
Define Task Definitions and Services: Task definitions are like blueprints for your application that define how your containers should run (CPU, memory, networking, IAM permissions, etc.). Services manage long-term running of your tasks and ensure the desired number of tasks are running in your cluster.
Create a Load Balancer (and Target Groups if necessary): To distribute traffic across your containers, you usually use a load balancer that can handle incoming requests and forward them to the appropriate container instances.
Auto Scaling: To handle the scalability part, you will setup ECS Service Auto Scaling policies. This will allow you to scale the number of tasks up and down based on demand, ensuring that your model serving infrastructure can handle the load when needed and scale down to save costs when demand is low.
Set Up Container Image and Repository: Your machine learning model needs to be containerized, meaning it should be packaged along with all of its dependencies in a Docker container. This container image is then stored in a repository, like Amazon ECR (Elastic Container Registry), from where ECS can pull the image to deploy.

Now let's write a Pulumi program that sets up this infrastructure. We'll use pulumi_awsx for high-level components that simplify working with AWS services.

import pulumi
import pulumi_aws as aws
import pulumi_awsx as awsx

# Create an ECS cluster to host our services
ecs_cluster = awsx.ecs.Cluster("model-serving-cluster")

# Define an IAM role that our ECS tasks will use to run
task_exec_role = aws.iam.Role("task-exec-role", assume_role_policy=aws.iam.get_assume_role_policy().json)
task_exec_policy_attachment = aws.iam.RolePolicyAttachment("task-exec-policy", 
                                                          role=task_exec_role.name, 
                                                          policy_arn=aws.iam.ManagedPolicy.AMAZON_ECS_TASK_EXECUTION_ROLE_POLICY)

# Specify a Docker image for the machine learning model
# Replace <repository-name-here> with your actual ECR repository name.
image = "<aws_account_id_here>.dkr.ecr.<region_here>.amazonaws.com/repository-name-here:latest"

# Define the task definition with a single container using the Docker image
task_definition = awsx.ecs.TaskDefinition("app-task", 
                                          family="model-serving",
                                          cpu="256",
                                          memory="512",
                                          execution_role_arn=task_exec_role.arn,
                                          containers = {
                                              "model-serving": awsx.ecs.TaskDefinitionContainerDefinitionArgs(
                                                  image=image,
                                                  cpu=256,
                                                  memory=512,
                                                  essential=True,
                                                  port_mappings=[awsx.ecs.TaskDefinitionPortMappingArgs(
                                                      container_port=80,
                                                      host_port=80,
                                                      protocol="tcp"
                                                  )],
                                              )})

# Define an ECS service to run and maintain the desired task count
ecs_service = awsx.ecs.EC2Service("app-svc", 
                                  cluster=ecs_cluster.arn,
                                  desired_count=2,
                                  task_definition=task_definition.arn)

# Attach a load balancer to the ECS service
lb = awsx.lb.ApplicationLoadBalancer("app-lb", external=True)

# Define a target group to connect the load balancer and the service
target_group = lb.add_target_group("app-target-group",
                                   port=80,
                                   targets=[ecs_service.load_balancer_target()],
                                   health_check=awsx.lb.ApplicationTargetGroupHealthCheckArgs(
                                       path="/health"
                                   ))

# Add a listener to the load balancer to forward HTTP requests to the target group
listener = lb.add_listener("app-listener", port=80, default_action=target_group)

# Output the URL of the load balancer to access our service
pulumi.export("app_url", lb.load_balancer.dns_name)

In the above program, we have defined the necessary components for deploying a machine learning model serving infrastructure. The ECS cluster is created to run containerized tasks, and the task definition specifies how these tasks should be configured, including the use of a Docker image that contains our machine learning model. We also define an IAM role for our tasks so they can interact with other AWS services as needed.

An application load balancer is created to distribute incoming traffic, ensuring that requests are efficiently routed to ECS service tasks that are instantiated from our task definition. We also set up a listener and a target group to manage the health checks and traffic between the load balancer and the tasks.

To use this program, replace <aws_account_id_here>, <region_here>, and <repository-name-here> with your actual AWS account ID, region, and ECR repository name where your machine learning model's Docker image is stored. Running this Pulumi program will create the infrastructure on AWS ECS for serving a machine learning model in a scalable manner.