Large Language Models Serving using AKS

Question

Pulumi · Accepted Answer

Azure Kubernetes Service (AKS) is a managed container orchestration service provided by Azure, which is based on the open source Kubernetes system. AKS simplifies the deployment, management, and scaling of containerized applications using Kubernetes.

In the context of serving large language models, AKS can be used to host and manage the infrastructure required by models for operations such as inference. These models could be part of machine learning or AI applications that require significant computational resources, and AKS is a suitable platform because it abstracts away the complexity of managing Kubernetes clusters whilst providing scalability and reliability.

Here’s how you might use Pulumi to create an AKS cluster which could be used for serving large language models:

1. **Create an AKS Cluster**: Define an AKS managed cluster with the necessary configurations such as node size, node count, and enabling RBAC (Role-Based Access Control) which is often required for managing access to Kubernetes resources.
2. **Scaling**: Decide on the autoscaling parameters for your AKS cluster. AKS supports cluster autoscaler, which automatically adjusts the number of nodes in a node pool.
3. **Networking**: Define the networking setup for the AKS cluster. This can include network policies for pods, service CIDRs, and DNS settings among others.
4. **Storage**: Define the storage classes and persistent volume claims if your application requires persistent storage.
5. **Provisioning compute resources**: Define the machine types and the number of nodes for your AKS cluster.
6. **Security**: Define the role and identity assignments necessary for secure access to the cluster and its resources.

Below is a Pulumi program written in Python to create a basic AKS cluster. The exact details, such as node size and count, will depend on the specific requirements of the large language model you are working with.

```python
import pulumi
import pulumi_azure_native as azure_native

# Configuration for the AKS Cluster
RESOURCE_GROUP_NAME = 'pulumi_aks_resource_group'
AKS_CLUSTER_NAME = 'pulumi_aks_cluster'
LOCATION = 'eastus'

# Create a Resource Group
resource_group = azure_native.resources.ResourceGroup('resource_group',
    resource_group_name=RESOURCE_GROUP_NAME,
    location=LOCATION)

# Create an AKS Cluster
aks_cluster = azure_native.containerservice.ManagedCluster('aks_cluster',
    resource_group_name=resource_group.name,
    identity=azure_native.containerservice.ManagedClusterIdentity(
        type="SystemAssigned"
    ),
    agent_pool_profiles=[{
        'count': 3,  # Start with 3 nodes
        'max_pods': 110,
        'mode': "System",
        'name': "agentpool",
        'os_disk_size_gb': 30,
        'os_type': "Linux",
        'vm_size': "Standard_DS2_v2",
        'type': "VirtualMachineScaleSets",  # Use VMSS for the node pool
    }],
    dns_prefix=AKS_CLUSTER_NAME,
    location=resource_group.location,
    kubernetes_version='1.18.14',
    resource_name=AKS_CLUSTER_NAME)

# Export the Cluster Name
pulumi.export('cluster_name', aks_cluster.name)

# Export the Kubernetes Configuration File
kubeconfig = pulumi.Output.all(resource_group.name, aks_cluster.name).apply(lambda args: azure_native.containerservice.list_managed_cluster_user_credentials(resource_group_name=args[0],resource_name=args[1])).apply(lambda creds: creds.kubeconfigs[0].value)
pulumi.export('kubeconfig', kubeconfig)
```

In the above code:

- A resource group is created in which all the resources will live.
- An AKS cluster is defined with an initial node count of three. The `vm_size` is set to `Standard_DS2_v2`, which may need to be adjusted for production use cases depending on the model size and request throughput.
- Autoscaling is not explicitly defined, but can be added by defining the `enable_auto_scaling` property in the `agent_pool_profiles`.
- The Kubernetes version is specified to ensure compatibility with the applications that will be deployed.
- RBAC is enabled by default in AKS, but finer control can be added as per requirements.
- Finally, the kubeconfig necessary to interact with the cluster is exported. This configuration can be used by `kubectl` to connect to and manage the AKS cluster.

Make sure to replace the placeholder values with the actual parameters suited for your large language model workload. The resources required for your specific use case, such as node size and count, may vary based on the computational needs of your large language models. Additionally, for production workloads, you would want to configure advanced networking, monitoring, and security settings.