Efficient AI Development with Azure Spot VMs.
PythonWhen you're looking to run AI development workloads with Azure, you might consider using Azure Spot Virtual Machines (VMs). Azure Spot VMs allow you to take advantage of unused Azure compute capacity at a significant cost savings. The price for Spot VMs is variable, based on the region and SKU, and it offers the same characteristics as a pay-as-you-go VM, with the exception that it can be evicted at any time if Azure needs the capacity back. This makes them ideal for workloads that can handle interruptions like batch processing jobs, dev/test environments, or large computations like AI development tasks where you can checkpoint your progress.
To create an Azure Spot VM using Pulumi, you'll need to use the
azure-native
Pulumi provider which directly interacts with Azure Resource Manager (ARM) to manage your Azure resources in a more native and fine-grained manner.In the following program, we will create a Spot VM along with the necessary supporting resources such as a resource group, virtual network, subnet, public IP, network interface, and finally the VM itself. The VM will be configured to run an AI development workload, with the option to specify the VM size and other properties that match your particular AI dev needs.
Let's write a program to accomplish this:
import pulumi from pulumi_azure_native import resources from pulumi_azure_native import network from pulumi_azure_native import compute # Set the configuration settings for your AI development environment # These can be modified based on the requirements of the workload vm_size = "Standard_DS1_v2" # Example VM size that supports Azure Spot instances location = "East US" # Azure region where resources will be deployed username = "ai_user" # Admin username for the VM password = "Pa$$w0rd1234" # Admin password for the VM (consider a more secure approach for production!) # Create an Azure Resource Group to contain our resources resource_group = resources.ResourceGroup("ai_resource_group") # Create a Virtual Network for the VM vnet = network.VirtualNetwork( "ai_vnet", resource_group_name=resource_group.name, address_space=network.AddressSpaceArgs( address_prefixes=["10.0.0.0/16"], ), location=location, ) # Create a Subnet within the Virtual Network subnet = network.Subnet( "ai_subnet", resource_group_name=resource_group.name, virtual_network_name=vnet.name, address_prefix="10.0.1.0/24", ) # Create a Public IP address for the VM public_ip = network.PublicIPAddress( "ai_public_ip", resource_group_name=resource_group.name, location=location, public_ip_allocation_method="Dynamic", ) # Create a Network Interface for the VM with the Public IP network_interface = network.NetworkInterface( "ai_nic", resource_group_name=resource_group.name, location=location, ip_configurations=[network.NetworkInterfaceIPConfigurationArgs( name="ai_nic_ip_config", subnet=network.SubnetArgs( id=subnet.id, ), public_ip_address=network.PublicIPAddressArgs( id=public_ip.id, ), )], ) # Create the Azure Spot VM spot_vm = compute.VirtualMachine( "ai_spot_vm", resource_group_name=resource_group.name, location=location, # Specify that this should be a Spot instance priority="Spot", eviction_policy="Deallocate", # What to do when the VM is evicted, Deallocate is recommended for Spot instances hardware_profile=compute.HardwareProfileArgs( vm_size=vm_size, ), network_profile=compute.NetworkProfileArgs( network_interfaces=[compute.NetworkInterfaceReferenceArgs( id=network_interface.id, )], ), os_profile=compute.OSProfileArgs( computer_name="ai-dev-machine", admin_username=username, admin_password=password, ), storage_profile=compute.StorageProfileArgs( image_reference=compute.ImageReferenceArgs( publisher="Canonical", offer="UbuntuServer", sku="18.04-LTS", version="latest", ), os_disk=compute.OSDiskArgs( caching="ReadWrite", create_option="FromImage", managed_disk=compute.ManagedDiskParametersArgs( storage_account_type="Standard_LRS", ), ), ), ) # Export the Public IP address of the VM pulumi.export("public_ip", public_ip.ip_address)
This Pulumi program automates the following steps:
- It initializes a new Azure Resource Group named
ai_resource_group
. - Inside that resource group, it creates a virtual network (
vnet
) and a subnet (subnet
). - It then provisions a public IP address (
public_ip
) which will be used to access the VM. - A network interface (
network_interface
) is created and associated with the public IP and subnet. - Finally, it provisions an Azure Spot VM (
spot_vm
) in the updatedEast US
region with the settings we specified.
The
compute.VirtualMachine
resource is configured with thepriority
set toSpot
, indicating that this should be a Spot VM, and theeviction_policy
is set toDeallocate
, meaning that when the VM is evicted, its resources are deallocated and can be reallocated when Spot capacity is available again.Remember to choose an Azure region where Spot VMs are available and also ensure that the
vm_size
corresponds to the size supported by Azure Spot VMs in your selected region.After running
pulumi up
, once the deployment is successful, you will receive the public IP address of the VM as an output. You can use this IP to SSH into the VM and start running your AI development tasks.Note: As best practice for production environments, you should handle secrets like the
password
in a more secure manner, perhaps using Azure Key Vault or Pulumi's configuration system to encrypt and manage such sensitive information.- It initializes a new Azure Resource Group named