Real-time Inference Services with AWS EKS

Question

Pulumi · Accepted Answer

Real-time inference services typically involve deploying a machine learning model on a cloud platform so that it can receive data and return predictions. On AWS, one of the most robust ways to do this is through Amazon EKS (Elastic Kubernetes Service), which allows you to easily run Kubernetes on AWS without needing to install and operate your own Kubernetes control plane or worker nodes.

In this program, we will create a real-time inference service using AWS EKS. We will set up an AWS EKS cluster and an ECR (Elastic Container Registry) repository to store our Docker images. The inference service will be deployed as a Kubernetes Deployment, which will manage the pods running our service. AWS EKS allows us to run our inference service at scale and manage traffic through a Kubernetes service.

Here's a step-by-step Pulumi program in Python that sets up an EKS cluster and deploys a real-time inference service:

1. **ECR Repository**: This will be our container registry where we store the Docker images used for our inference service.
2. **EKS Cluster**: The Kubernetes control plane where our service will be scheduled and managed.
3. **IAM Role**: Create an IAM role for the EKS cluster with the necessary permissions.
4. **EKS Node Group**: A set of worker nodes that will run our Kubernetes pod deployments.

Let's begin the Pulumi program with these components:

```python
import pulumi
import pulumi_aws as aws
import pulumi_eks as eks

# Create an ECR repository to store our Docker images.
ecr_repo = aws.ecr.Repository("inferenceServiceRepo")

# Output the ECR repository URL that will be used to tag our Docker images later.
pulumi.export("repo_url", ecr_repo.repository_url)

# IAM role for the EKS cluster with basic permissions.
# For a real case, attach the policies according to the specific needs.
eks_role = aws.iam.Role("eksRole", assume_role_policy="""
{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Principal": {
            "Service": "eks.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
    }]
}
""")

# Attach the AmazonEKSClusterPolicy to the role created above.
eks_policy_attachment = aws.iam.RolePolicyAttachment("eksPolicyAttachment",
                                                     role=eks_role.name,
                                                     policy_arn="arn:aws:iam::aws:policy/AmazonEKSClusterPolicy")

# VPC and Subnets for the EKS cluster.
vpc = aws.ec2.Vpc("eksVpc", cidr_block="10.100.0.0/16")
subnets = aws.ec2.Subnet("eksSubnet",
                          count=2,
                          vpc_id=vpc.id,
                          cidr_block=pulumi.Output.concat("10.100.", pulumi.Output.from_input(index + 1), ".0/24"),
                          availability_zone=pulumi.Output.apply(lambda index: ["us-west-2a", "us-west-2b"][index]),
                          map_public_ip_on_launch=True)

# Security Group which allows the EKS cluster to communicate with the worker nodes.
eks_security_group = aws.ec2.SecurityGroup("eksSecurityGroup",
                                           vpc_id=vpc.id,
                                           description="Allow all HTTP(s) traffic to EKS",
                                           ingress=[
                                               {"protocol": "tcp", "from_port": 80, "to_port": 80, "cidr_blocks": ["0.0.0.0/0"]},
                                               {"protocol": "tcp", "from_port": 443, "to_port": 443, "cidr_blocks": ["0.0.0.0/0"]},
                                           ],
                                           egress=[
                                               {"protocol": "-1", "from_port": 0, "to_port": 0, "cidr_blocks": ["0.0.0.0/0"]},
                                           ])

# Create the EKS cluster itself.
cluster = eks.Cluster("eksCluster",
                      role_arn=eks_role.arn,
                      vpc_id=vpc.id,
                      subnet_ids=subnets.ids,
                      security_group_ids=[eks_security_group.id])

# Node group for EKS to create and manage the worker nodes for the cluster.
node_group = eks.NodeGroup("eksNodeGroup",
                           cluster_name=cluster.eks_cluster.name,
                           node_role_arn=eks_role.arn,
                           subnet_ids=subnets.ids,
                           scaling_config={
                               "desired_size": 2,
                               "max_size": 2,
                               "min_size": 1,
                           })

# Output the EKS cluster name and endpoint which can be used to configure kubectl.
pulumi.export("eks_cluster_name", cluster.eks_cluster.name)
pulumi.export("eks_cluster_endpoint", cluster.eks_cluster.endpoint)
```

This program begins by importing the required Pulumi packages. We define our ECR repository, EKS cluster, and node group, including IAM roles with the necessary permissions setup. Make sure to replace CIDR blocks, AWS regions, and the number of subnets with details appropriate for your setup.

Before running this code, you'll need to build and push the Docker image for your real-time inference service to the ECR repository. Then, you would normally define your Kubernetes Deployment and Service resources in Pulumi, pointing to the images you've pushed to ECR and configuring them to handle real-time inference requests.

Keep in mind that this code should be part of a bigger Pulumi program that includes the Kubernetes deployment definition, the Docker build, and push steps and any additional AWS resources your service needs. The IAM policy should contain the exact permissions your specific case requires.