Enhanced Inference Performance with Azure Proximity Placement Groups

Question

Pulumi · Accepted Answer

To achieve enhanced inference performance, particularly in scenarios where minimal network latency is crucial, Azure offers the concept of Proximity Placement Groups (PPGs). Proximity Placement Groups are a way to keep Azure compute resources physically close to each other, which can be particularly beneficial when you have a set of interdependent services that require low-latency network connections.

For instance, if you're running a machine learning inference service, you might want the VMs hosting the models and the databases or storage systems they interact with to be located as close as possible within an Azure region to lower the response time.

Below is a Pulumi program written in Python that demonstrates how to create a Proximity Placement Group in Azure using the `azure-native` provider. The program will create a Proximity Placement Group, a Virtual Network, a Subnet, a Network Interface, and a Virtual Machine that is assigned to the Proximity Placement Group. This setup ensures that the VM for your inference service is placed close to related resources for optimal performance.

Before we start, make sure you have Pulumi and Azure CLI installed and configured. You'll need to log in to your Azure account and set the appropriate subscription.

```python
import pulumi
import pulumi_azure_native as azure_native

# Create a Proximity Placement Group for holding related resources close together.
proximity_placement_group = azure_native.compute.ProximityPlacementGroup(
    "inferencePPG",
    resource_group_name=pulumi.ResourceGroup.get("resourceGroup").name,
    # Proximity Placement Groups can be of type 'Standard' or 'Ultra' depending on your requirements.
    proximity_placement_group_type="Standard"
)

# Create a Virtual Network for our VM to attach to.
vnet = azure_native.network.VirtualNetwork(
    "inferenceVNet",
    resource_group_name=pulumi.ResourceGroup.get("resourceGroup").name,
    address_space=azure_native.network.AddressSpaceArgs(
        address_prefixes=["10.0.0.0/16"]
    ),
)

# Create a Subnet within the Virtual Network for our VM.
subnet = azure_native.network.Subnet(
    "inferenceSubnet",
    resource_group_name=pulumi.ResourceGroup.get("resourceGroup").name,
    virtual_network_name=vnet.name,
    address_prefix="10.0.1.0/24",
)

# Create a Network Interface for our VM to use.
network_interface = azure_native.network.NetworkInterface(
    "inferenceNIC",
    resource_group_name=pulumi.ResourceGroup.get("resourceGroup").name,
    ip_configurations=[azure_native.network.NetworkInterfaceIPConfigurationArgs(
        name="inferenceNICConf",
        subnet=azure_native.network.SubnetArgs(
            id=subnet.id
        ),
    )],
)

# Create a Virtual Machine that is associated with the Proximity Placement Group and Network Interface.
virtual_machine = azure_native.compute.VirtualMachine(
    "inferenceVM",
    resource_group_name=pulumi.ResourceGroup.get("resourceGroup").name,
    hardware_profile=azure_native.compute.HardwareProfileArgs(
        vm_size="Standard_DS1_v2"
    ),
    storage_profile=azure_native.compute.StorageProfileArgs(
        image_reference=azure_native.compute.ImageReferenceArgs(
            publisher="Canonical",
            offer="UbuntuServer",
            sku="18.04-LTS",
            version="latest"
        ),
        os_disk=azure_native.compute.OSDiskArgs(
            create_option="FromImage",
            name="myosdisk1"
        )
    ),
    os_profile=azure_native.compute.OSProfileArgs(
        computer_name="inferenceVM",
        admin_username="adminuser",
        admin_password="Password1234!"
    ),
    network_profile=azure_native.compute.NetworkProfileArgs(
        network_interfaces=[
            azure_native.compute.NetworkInterfaceReferenceArgs(
                id=network_interface.id
            )
        ]
    ),
    proximity_placement_group=azure_native.compute.SubResourceArgs(
        id=proximity_placement_group.id
    ),
)

# Export the Public IP of the VM to access it if needed.
public_ip = pulumi.Output.all(virtual_machine.id, network_interface.id).apply(
    lambda ids: azure_native.network.PublicIPAddress.get(f"{ids[0]}PublicIP",
                                                        resource_group_name=pulumi.ResourceGroup.get("resourceGroup").name,
                                                        public_ip_address_name=f"{ids[1]}PublicIP")
)

pulumi.export('publicIP', public_ip.ip_address)
```

Let's break down what each part of the code is doing:

- **Proximity Placement Group**: We are creating a proximity placement group with standard SKU, which will be used to group our virtual machine and any other related resources, ensuring that they are physically located close together within the Azure datacenter for reduced latency.

- **Virtual Network and Subnet**: We define a virtual network (vnet) for our infrastructure, along with a subnet. These will be used for networking purposes within Azure.

- **Network Interface (NIC)**: We create a network interface that will be attached to our virtual machine. This is necessary for the VM to communicate within the virtual network and externally.

- **Virtual Machine (VM)**: We deploy a virtual machine attached to the Proximity Placement Group and the created Network Interface.

- **Outputs**: We export the public IP address of the VM, which could be used to SSH into the machine or to access services running on it.

Keep in mind that in a real-world scenario, you should use Azure Key Vault to manage secrets like admin passwords securely, rather than hard-coding them as shown in this program.

Lastly, be sure to pick the right VM size for your application; in this program, "Standard_DS1_v2" is used as a placeholder and might not fit the needs of a machine learning inference service.

After deploying this program with Pulumi, the created resources will be orchestrated to ensure optimal network performance for your inference workloads by the proximity of the relevant compute and storage resources.