Low-Latency Inference Data Storage on Azure Managed Disks

Question

Pulumi · Accepted Answer

To create a low-latency inference data storage solution on Azure, you would use Azure Managed Disks. These are block-level storage volumes that you can attach to Azure virtual machines (VMs). Managed Disks handle storage account management for you, and they ensure that you have the necessary storage available. They are highly durable and available and offer options to provide the level of performance your application may require.

In Pulumi, you can use the `azure-native.compute.Disk` resource to provision and manage these disks. This resource allows you to create a new managed disk, specify its size, performance characteristics (such as IOPS and throughput), and other properties such as encryption and network access policies.

Below is a Pulumi program in Python that creates an Azure Managed Disk intended for low-latency scenarios. This program presumes that you have already configured Pulumi with the necessary Azure credentials and set the appropriate Azure region.

```python
import pulumi
import pulumi_azure_native as azure_native

# Create an Azure Managed Disk optimized for low latency operations
low_latency_disk = azure_native.compute.Disk("lowLatencyDisk",
    # Location must be set to the region where you want to create the disk
    location="eastus",
    # Define the size of the disk (in GB). Choose a size that fits your use-case.
    disk_size_gb=128,
    # Define the disk SKU which determines the performance and cost characteristics.
    # For low latency scenarios, consider 'Premium_LRS' or 'UltraSSD_LRS'
    sku=azure_native.compute.DiskSkuArgs(
        name="Premium_LRS",  # 'Premium_LRS' represents premium SSDs with good performance and latency characteristics
    ),
    # Set the creation data for the disk. 'Empty' indicates an empty disk will be created.
    creation_data=azure_native.compute.CreationDataArgs(
        create_option="Empty",
    ),
    # Enable or disable bursting which can enhance performance for scenarios with burstable workloads.
    bursting_enabled=True,
    # Apply tags to the disk resource, if required
    tags={"purpose": "low-latency-inference"}
)

# Export the ID of the created disk
pulumi.export("disk_id", low_latency_disk.id)

# Export the URI of the created disk to be used with VMs
pulumi.export("disk_uri", low_latency_disk.id.apply(lambda disk_id: f"/subscriptions/{pulumi.config.subscription()}/resourceGroups/{pulumi.config.get('resource_group')}/providers/Microsoft.Compute/disks/{disk_id}"))
```

In this program:
- We use `azure_native.compute.Disk` to create a new Managed Disk.
- We specify the location, size (`disk_size_gb`), and SKU for the disk. You might choose Premium SSD ('Premium_LRS') or even Ultra Disk ('UltraSSD_LRS') based on the performance your workload needs.
- We set `create_option` to "Empty" because we are creating a new empty disk.
- We enable bursting to allow the disk to perform above its provisioned performance targets for short periods – potentially reducing latency and improving throughput during these bursts.
- We apply a tag to the disk to denote its purpose.
- We export the disk ID and URI, which would be useful for attaching this disk to a virtual machine or for management purposes.

Keep in mind that you need to replace `"eastus"` with the Azure region that is geographically closest to the users of your application to minimize the latency further. Additionally, ensure that your virtual machine sizes and capabilities support the type of Managed Disk you plan to use.