High-Availability EKS Nodegroups for AI Inference Services

Question

Pulumi · Accepted Answer

To create high-availability EKS node groups suitable for AI Inference Services, you would use the `ManagedNodeGroup` component from the `pulumi_eks` package. This package provides a high-level abstraction over AWS EKS resources, simplifying the setup of an EKS cluster and its node groups.

In the context of AI Inference Services, you would typically require nodes with specialized hardware such as GPUs. AWS provides instances like `p2`, `p3`, and `g4` which are equipped with GPUs suitable for such tasks. You would also want your node groups to be distributed across multiple availability zones (AZs) for high availability.

To begin, you'll need to create an EKS cluster using `eks.Cluster` and then create one or more managed node groups with the necessary configurations for AI Inference Services. Below is the program to create a high-availability EKS cluster with GPU-enabled node groups distributed across multiple AZs.

```python
import pulumi
import pulumi_eks as eks

# Create an EKS Cluster.
cluster = eks.Cluster('ai-inference-eks-cluster',
    # Define the desired Kubernetes version
    version='1.21',
    # More settings can be configured for fine-grained control such as VPC configuration, security groups, etc.
)

# Create a GPU-enabled EKS Node Group for AI Inference within the cluster.
gpu_node_group = eks.ManagedNodeGroup('ai-inference-gpu-node-group',
    cluster=cluster.core,  # Associate the node group with the created cluster.
    node_group_name='ai-inference-nodes',  # A meaningful name for the node group.
    instance_types=['g4dn.xlarge'],  # Select a GPU-enabled instance type, e.g., G4 instances.
    desired_capacity=2,  # Number of desired nodes in the node group, can scale as required.
    min_size=1,  # Minimum size of the node group.
    max_size=4,  # Maximum size could be higher depending on your inference load.
    labels={'workload-type': 'ai-inference'},  # Label the node group for workload separation.
    tags={'Purpose': 'AIInference'},  # Tagging resources is good practice for cost-tracking.
    # EKS node groups should be placed in multiple AZs for high availability. This could be specified with the subnet IDs.
    # Distribute nodes across multiple AZs using taints if needed.
)

# The cluster's kubeconfig can be exported or saved to be used by CI/CD systems or kubectl from a local machine.
# Saving the kubeconfig is not recommended for production use due to security considerations.
pulumi.export('kubeconfig', cluster.kubeconfig)

# When dealing with GPU instances, it's often necessary to install specific drivers and software.
# This can be handled via Kubernetes DaemonSets or other configuration management tools post-deployment.
```

In this program:

1. We create an EKS cluster with a specified Kubernetes version using `eks.Cluster`.
2. We define a managed node group using `eks.ManagedNodeGroup`, which is suited for AI workloads using GPU-enabled instances (`g4dn.xlarge` in this example).
3. The desired capacity, minimum size, maximum size, labels, and tags are specified for the node group. These can be adjusted based on your specific workload requirements.
4. We've commented on how to deal with multi-AZ deployments and node group affinities, which are crucial for high availability but are dependent on your specific AWS environment, such as available subnets across AZs.
5. We export the kubeconfig for the EKS cluster. This kubeconfig can be used to interact with your Kubernetes cluster using tools like `kubectl`.

Remember to replace placeholder values with actual values according to your environment, and ensure you have the necessary IAM roles and policies set up for the EKS cluster and node groups.

For more details and advanced configurations, you can refer to the [EKS Managed Node Group](https://www.pulumi.com/registry/packages/eks/api-docs/managednodegroup/) documentation.