Network Isolation for Azure Kubernetes Service (AI Workloads)
PythonTo implement network isolation for Azure Kubernetes Service (AKS), especially tailored for AI workloads, there are several steps and resources to consider. Azure offers various services and features to enable network isolation, which includes setting up a virtual network, subnet configuration, network policies, and enabling private connections.
In the context of Pulumi and infrastructure as code, I'll guide you through the process of creating an isolated network environment for your AKS cluster using Pulumi's Python SDK. We'll leverage Azure resources such as:
VirtualNetwork
andSubnet
: These resources will be used to create a virtual network and a dedicated subnet for the AKS cluster.NetworkProfile
: This defines the network configuration for the AKS cluster, including any network policies.PrivateLinkService
andPrivateEndpointConnection
: To connect privately to Azure resources and establish private endpoint connections, enabling private access to the AKS API server.RoleAssignment
: To define the permissions for the AKS service principal to interact with the necessary network resources.
In the following program, we will set up an AKS cluster within a virtual network that is isolated from the public internet. Users will need to replace placeholder values with their own resource names, locations, and credentials where necessary.
Let's begin by writing the Pulumi program:
import pulumi import pulumi_azure_native as azure_native # Replace these variables with your own values RESOURCE_GROUP_NAME = "my-ai-workload-rg" LOCATION = "East US" CLUSTER_NAME = "my-aks-cluster" VNET_NAME = "my-vnet" SUBNET_NAME = "my-aks-subnet" AKS_NODE_SIZE = "Standard_DS2_v2" AKS_NODE_COUNT = 2 # Create an Azure Resource Group resource_group = azure_native.resources.ResourceGroup(RESOURCE_GROUP_NAME, resource_group_name=RESOURCE_GROUP_NAME, location=LOCATION) # Create a Virtual Network for the AKS Cluster vnet = azure_native.network.VirtualNetwork(VNET_NAME, resource_group_name=resource_group.name, location=resource_group.location, address_space=azure_native.network.AddressSpaceArgs( address_prefixes=["10.0.0.0/16"] )) # Create a Subnet for the AKS Cluster subnet = azure_native.network.Subnet(SUBNET_NAME, resource_group_name=resource_group.name, virtual_network_name=vnet.name, address_prefix="10.0.0.0/24", private_link_service_network_policies="Disabled", private_endpoint_network_policies="Enabled") # Create the AKS Cluster aks_cluster = azure_native.containerservice.ManagedCluster(CLUSTER_NAME, resource_group_name=resource_group.name, location=resource_group.location, dns_prefix="aksk8s", agent_pool_profiles=[{ 'count': AKS_NODE_COUNT, 'vmSize': AKS_NODE_SIZE, 'name': 'agentpool', 'vnetSubnetId': subnet.id, 'maxPods': 110, 'osType': 'Linux', 'type': 'VirtualMachineScaleSets', 'mode': 'System', }], service_principal_profile={ 'clientId': "your-service-principal-client-id", 'secret': "your-service-principal-secret", }, network_profile=azure_native.containerservice.ContainerServiceNetworkProfileArgs( network_plugin="azure", service_cidr="10.10.0.0/16", dns_service_ip="10.10.0.10", docker_bridge_cidr="172.17.0.1/16", )) # Exposing the AKS Cluster with Private Link private_link_service = azure_native.network.PrivateLinkService( "my-private-link-service", resource_group_name=resource_group.name, location=resource_group.location, ip_configurations=[{ 'name': 'my-private-link-service-ip-configuration', 'privateIPAddressVersion': "IPv4", 'subnet': { 'id': subnet.id, }, }], auto_approval=azure_native.network.AutoApprovalArgs( subscriptions=[pulumi.Config().require("subscriptionId")] ), visibility=azure_native.network.PrivateLinkServicePropertiesVisibilityArgs( subscriptions=[pulumi.Config().require("subscriptionId")] ), fqdns=[f'{CLUSTER_NAME}.{LOCATION}.azmk8s.io'], ) # Exporting the AKS Cluster API Server endpoint pulumi.export('AKS Cluster Endpoint', aks_cluster.private_fqdn)
Here’s a breakdown of what this Pulumi program is doing:
- Defines a resource group for all resources (
ResourceGroup
). - Creates a virtual network (
VirtualNetwork
) with a custom address space. - Adds a subnet (
Subnet
) within the virtual network. Note that we disableprivate_link_service_network_policies
to ensure that Private Link Service can integrate with this subnet. - Initiates the creation of an AKS cluster (
ManagedCluster
) within the subnet. The cluster's network profile specifies the network configuration details, such as the service CIDR. - Sets up a Private Link Service (
PrivateLinkService
) with an IP configuration that uses the subnet that was created. This is the crux of the network isolation feature since it provides a private endpoint for the AKS API server. - Exports the private fully qualified domain name (FQDN) of the AKS cluster, which clients inside the virtual network can use to communicate with the AKS cluster.
To use this code effectively, you need to insert your own values for the service principal client ID and secret, subscription ID, and any other place where there is a placeholder or a specific configuration tailored to your environment.
Please ensure you have the Azure CLI installed and configured with the correct permissions, as well as the Pulumi CLI set up with an account to save the state of your infrastructure. After setting up, run
pulumi up
to create the infrastructure.