Scalable Network Interfaces for Azure Kubernetes AI Workloads
PythonCreating a scalable set of network interfaces for Azure Kubernetes Service (AKS) to support AI workloads involves several steps. First, we'll create a Virtual Network and Subnet where our AKS cluster can live. Within that virtual network, we'll set up an AKS cluster, ensuring to configure it to use advanced networking features, such as Azure CNI (Container Networking Interface), which provides a more granular control over pod networking and allows each pod to have a dedicated IP address within the subnet.
For the Kubernetes nodes, we can use Virtual Machine Scale Sets (VMSS) which allow us to scale the number of nodes in or out, based on demand. Each node in the scale set will be assigned a network interface that will be managed automatically. AKS under the hood utilizes VMSS to provision the nodes.
Here’s how you can set up scalable network interfaces for an AKS cluster aimed at AI workloads using Pulumi:
Explanation and Program:
- Network Set-up: We'll start by creating a virtual network and a subnet. These provide the foundational network resources on which our AKS cluster will be built.
- AKS Cluster Creation: We’ll define an AKS cluster, configured with a
NetworkProfile
that specifies the use of Azure CNI for networking. This is crucial for AI workloads that may require enhanced network performance. - Scaling with VMSS: The AKS cluster nodes will use a Virtual Machine Scale Set, which provides scalability for our AI applications. The scale set automatically manages the creation of network interfaces for each node.
- Subnet Association: The AKS cluster is associated with the subnet we created, ensuring that all nodes and pods live within our custom-designed virtual network.
import pulumi import pulumi_azure_native as azure_native # Set up the virtual network and associated subnet for AKS vnet = azure_native.network.VirtualNetwork( "vnet", address_space=azure_native.network.AddressSpaceArgs( address_prefixes=["10.0.0.0/16"] ), resource_group_name="<resource_group_name>" ) subnet = azure_native.network.Subnet( "aks-subnet", address_prefix="10.0.1.0/24", resource_group_name="<resource_group_name>", virtual_network_name=vnet.name ) # Create the AKS cluster with advanced networking features using Azure CNI aks_cluster = azure_native.containerservice.ManagedCluster( "aksCluster", agent_pool_profiles=[{ "mode": "System", "count": 3, "vmSize": "Standard_D2_v2", "vnet_subnet_id": subnet.id, "name": "agentpool" }], dns_prefix="akspulumi", enable_rbac=True, network_profile=azure_native.containerservice.ContainerServiceNetworkProfileArgs( network_plugin="azure", network_policy="azure", service_cidr="10.0.2.0/24", dns_service_ip="10.0.2.10", docker_bridge_cidr="172.17.0.1/16" ), kubernetes_version="1.19.11", resource_group_name="<resource_group_name>", service_principal_profile=azure_native.containerservice.ManagedClusterServicePrincipalProfileArgs( client_id="<service_principal_client_id>", secret="<service_principal_secret>", ) ) # Export the AKS cluster properties needed for connection kubeconfig = pulumi.Output.all(aks_cluster.name, resource_group.name).apply( lambda args: azure_native.containerservice.list_managed_cluster_user_credentials( resource_group_name=args[1], resource_name=args[0], ) ) pulumi.export('kubeConfig', kubeconfig)
Replace
<resource_group_name>
,<service_principal_client_id>
, and<service_principal_secret>
with your Azure resource group name and service principal credentials. The service principal is used for authentication with the Azure API and should be created prior to running this code.The
agent_pool_profiles
define the configuration for the node pool within the AKS cluster. WithvmSize
, we can specify the size of the VM which adjusts the available resources such as CPU and memory—key for AI workloads.The
network_profile
configuration is where the Azure CNI plugin is specified. The addresses provided inservice_cidr
,dns_service_ip
, anddocker_bridge_cidr
are utilized for Kubernetes services and should not overlap with any other network range in use.Finally, we export the credentials required to connect to your Kubernetes cluster with the
pulumi.export
function. This output can be used to configurekubectl
to interact with your AKS cluster.Please note that this is a simplified example, and additional configurations would likely be needed for real-world AI workload deployments, including considerations for AI accelerator hardware, storage, and more nuanced security configurations.