Autoscaling Deep Learning Inference with AWS EKS on Fargate

Question

Pulumi · Accepted Answer

To create an autoscaling deep learning inference solution using AWS EKS on Fargate, we will use several AWS services and resources to set up our environment. This will involve setting up an EKS Cluster, a Fargate Profile, and Kubernetes Autoscaling configurations that manage the scaling behavior of our Fargate pods based on CPU or memory usage.

Here are the steps we will follow:

EKS Cluster: We'll start by provisioning an Amazon Elastic Kubernetes Service (EKS) cluster, which is a managed Kubernetes service that makes it easy to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane.
Fargate Profile: Once we have the EKS cluster, we'll create a Fargate profile. Fargate is a serverless compute engine for containers that works with both Amazon Elastic Container Service (ECS) and EKS. With Fargate, you don't need to provision or manage servers; you define your application at the task level and Fargate handles all the underlying infrastructure for you.
ECR Repository: To store the deep learning inference container images, we'll use Amazon Elastic Container Registry (ECR), which is a fully-managed Docker container registry that makes it easy for developers to store, manage, and deploy Docker container images.
Kubernetes Deployment: Once our EKS cluster and Fargate profile are ready, and our container image is stored in an ECR repository, we'll define a Kubernetes Deployment. This will ensure that a specified number of pod replicas are running at any one time.
Horizontal Pod Autoscaler (HPA): The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment, or replica set based on observed CPU or memory utilization.

Now, let's proceed with the Pulumi Python program that sets up this infrastructure:

import pulumi
import pulumi_aws as aws
import pulumi_awsx as awsx
import pulumi_kubernetes as k8s

# Define the EKS Cluster
eks_cluster = aws.eks.Cluster(
    "eksCluster",
    role_arn=<YOUR_EKS_SERVICE_ROLE_ARN>,
    vpc_config=aws.eks.ClusterVpcConfigArgs(
        endpoint_public_access=True,
        public_access_cidrs=["0.0.0.0/0"],
        subnet_ids=<YOUR_PUBLIC_SUBNET_IDS>,
    ),
)

# Define the EKS Fargate Profile
fargate_profile = aws.eks.FargateProfile(
    "fargateProfile",
    cluster_name=eks_cluster.name,
    pod_execution_role_arn=<YOUR_FARGATE_POD_EXECUTION_ROLE_ARN>,
    selectors=[aws.eks.FargateProfileSelectorArgs(
        namespace="default",
        labels={"eks.amazonaws.com/fargate-profile": "system"},
    )],
    subnet_ids=<YOUR_PRIVATE_SUBNET_IDS>,
)

# Create an ECR Repository to store our Docker images
ecr_repo = aws.ecr.Repository(
    "deepLearningInferenceRepo",
    image_tag_mutability="MUTABLE",
    image_scanning_configuration=aws.ecr.RepositoryImageScanningConfigurationArgs(
        scan_on_push=True,
    ),
)

# Use the awsx.ecs.FargateService to define the Fargate service
# The actual container image should be replaced with the deep learning inference
# application image. Be sure to use the correct image from your ECR.
fargate_service = awsx.ecs.FargateService(
    "deepLearningInferenceService",
    task_definition_args=awsx.ecs.FargateServiceTaskDefinitionArgs(
        containers={
            "inference": awsx.ecs.FargateServiceTaskDefinitionContainerArgs(
                image=ecr_repo.repository_url.apply(lambda url: f"{url}:latest"),
                cpu=512,
                memory=2048,
                port_mappings=[awsx.ecs.FargateServiceTaskDefinitionPortMappingArgs(
                    container_port=5000,
                )],
            ),
        },
    ),
    desired_count=1,
)

pulumi.export('ecr_repo_url', ecr_repo.repository_url)
pulumi.export('eks_cluster_name', eks_cluster.name)
pulumi.export('fargate_profile_name', fargate_profile.name)

In this code:

<YOUR_EKS_SERVICE_ROLE_ARN> is the ARN for the AWS EKS service role.
<YOUR_FARGATE_POD_EXECUTION_ROLE_ARN> is the ARN for the AWS role that the Fargate profile will use.
<YOUR_PUBLIC_SUBNET_IDS> should be replaced with a list of subnet IDs for your EKS cluster.
<YOUR_PRIVATE_SUBNET_IDS> should be replaced with subnet IDs that Fargate will use.

We define the EKS cluster, Fargate profile, and an ECR repository to store our container images. We also create an awsx ECS Fargate service with the task definition containing our deep learning workload. The actual image for the container would need to be built and pushed to the ECR repository specified in the Pulumi program.

The next step would be to deploy your Kubernetes resources such as Deployments, Services, and configure Horizontal Pod Autoscaler on your EKS cluster. But this part goes beyond infrastructure provisioning and into application deployment on Kubernetes which would involve Kubernetes manifests or Helm charts.

To complete the autoscaling setup, you need to deploy those Kubernetes resources and create an HPA (Horizontal Pod Autoscaler) that targets your deployments. That configuration is typically done within the Kubernetes manifest files, and while those can also be managed by Pulumi using the pulumi_kubernetes module, they often reside alongside the application code in the deployment pipeline.