Serverless Deep Learning with AWS Fargate on EKS Clusters

Question

Pulumi · Accepted Answer

To create a serverless deep learning environment utilizing AWS Fargate on Amazon Elastic Kubernetes Service (EKS) clusters, you'll define a combination of AWS services using Pulumi's Infrastructure as Code (IaC) approach. Here's an overview of what we need to do:

1. **Set up an EKS Cluster**: An EKS Cluster is Amazon's managed Kubernetes service which allows you to run Kubernetes on AWS without managing the underlying infrastructure.

2. **Configure Fargate Profile for EKS**: AWS Fargate is a serverless compute engine for containers that works with both Amazon Elastic Container Service (ECS) and EKS. We'll use it for running our containers without having to manage or scale a fleet of virtual machines.

3. **Define an ECR Repository**: Amazon Elastic Container Registry (ECR) is a container image registry service. We'll use it to store our deep learning container images.

4. **Deploy a Kubernetes Application**: We'll create Kubernetes deployments and services to run our deep learning models in a serverless manner on EKS with Fargate.

Now, let's build this infrastructure with Pulumi in Python:

```python
import pulumi
import pulumi_aws as aws
import pulumi_awsx as awsx
import pulumi_eks as eks

# Step 1: Create an EKS Cluster
# Define the IAM roles for the cluster and create the cluster with default settings,
# including the creation of a Fargate profile that targets all pods from the 'default' namespace.
cluster = eks.Cluster('deep-learning-eks',
                      create_oidc_provider=True,
                      skip_default_node_group=True)

# Step 2: Configure Fargate Profile for EKS
# This profile targets all pods with the 'fargate' runtime in the 'kube-system' namespace.
fargate_profile = aws.eks.FargateProfile('deep-learning-fargate-profile',
                                         cluster_name=cluster.core.cluster.name,
                                         pod_execution_role_arn=cluster.core.execution_role.arn,
                                         selectors=[{
                                             'namespace': 'kube-system',
                                             'labels': {
                                                 'fargate': 'true'
                                             }
                                         }])

# Step 3: Define an ECR Repository
# We will deploy a deep learning application later, which requires a Docker image stored in ECR.
ecr_repo = aws.ecr.Repository('deep-learning-repo')

# For simplicity, we define some placeholder values for the deep learning application.
# In a real-world scenario, you would build and push a Docker image to this ECR repository
# containing the code for your deep learning model.
app_name = 'dl-app'
app_image = 'deep-learning-image'  # Placeholder image name
app_tag = 'v1'  # Placeholder image tag

# Step 4: Deploy a Kubernetes Application
# Deploying an example application to the EKS cluster using AWS Fargate.
# Normally, you would specify your container image that contains the deep learning application here.
app_labels = { 'app': app_name }
app_deployment = awsx.ecs.FargateService(
    'deep-learning-app-deployment',
    cluster=cluster.core.cluster,
    task_definition_args=awsx.ecs.FargateServiceTaskDefinitionArgs(
        containers={
            app_name: awsx.ecs.TaskDefinitionContainerDefinitionArgs(
                image=ecr_repo.repository_url.apply(
                    lambda url: f"{url}:{app_tag}"),
                cpu=4,
                memory=8192,
                port_mappings=[awsx.ecs.TaskDefinitionPortMappingArgs(
                    container_port=80
                )]
            ),
        },
    ),
    desired_count=1,
    labels=app_labels,
)

# Expose an endpoint for the service
app_service = awsx.lb.ApplicationLoadBalancer(f"deep-learning-lb", external=True, vpc=cluster.vpc_id)
app_target_group = app_service.default_target_group
app_target_group.attach(app_deployment)
app_listener = app_service.listeners[0]

# Export the cluster kubeconfig and the application endpoint for easy access
pulumi.export('kubeconfig', cluster.kubeconfig)
pulumi.export('app_endpoint', app_listener.endpoint.hostname)
```

In this Pulumi program:

- We create an EKS cluster to run our workloads using the `eks.Cluster` class ([eks.Cluster](https://www.pulumi.com/registry/packages/eks/api-docs/cluster/)).
- We add a Fargate profile targeting specific namespaces and labels using `aws.eks.FargateProfile` ([aws.eks.FargateProfile](https://www.pulumi.com/registry/packages/aws/api-docs/eks/fargateprofile/)).
- We declare an ECR repository to store our container images using `aws.ecr.Repository` ([aws.ecr.Repository](https://www.pulumi.com/registry/packages/aws/api-docs/ecr/repository/)).
- We simulate deploying a deep learning application by defining a `FargateService`, though we've used placeholders for the container image as you’d typically build and push your image to ECR.

Run the Pulumi program with `pulumi up`. After completing the deployment, you'll get the Kubernetes kubeconfig and the application endpoint, which you can use to interact with the deployed application.