Batch AI Inference at Scale using Azure Batch

Question

Pulumi · Accepted Answer

When conducting AI inference at scale, you generally need to process large volumes of data in a way that utilizes the available compute resources efficiently. Azure Batch is a cloud service for running large-scale parallel and high-performance computing applications efficiently in the cloud. Azure Batch schedules compute-intensive work to run on a managed collection of virtual machines, and can automatically scale compute resources to meet the needs of your jobs.

In this context, Batch AI inference would involve creating a Batch account, setting up compute nodes within a pool, and running jobs that execute your AI inference application on these nodes. To accomplish this with Pulumi, you will need several resources:

- **Batch Account**: This is a fully managed Batch service that you use to run your batch jobs. By creating a Batch Account, you are able to define and manage pools of compute resources, and submit and track jobs.

- **Pool**: Pools contain compute nodes. These are the virtual machines that execute the batch jobs. You can specify the size of the virtual machines in the pool, configure applications and scheduling policies, etc.

- **Job**: Jobs manage the tasks that run on the compute nodes in pools. You can define job constraints, assign tasks, manage task lifecycle, etc.

The following program provides an example of how you might set up these resources using Pulumi with Azure. This is a basic setup for you to get started:

1. Creating a Batch account.
2. Creating a Pool within that Batch account.
3. Running a Job and Tasks within that Pool.

```python
import pulumi
import pulumi_azure_native as azure_native

# Step 1: Create a new resource group if you don't have one already
resource_group = azure_native.resources.ResourceGroup('myresourcegroup')

# Step 2: Create an Azure Batch Account
batch_account = azure_native.batch.BatchAccount('mybatchaccount',
    resource_group_name=resource_group.name,
    location=resource_group.location)

# Step 3: Create a Pool in the Batch Account to run your jobs
batch_pool = azure_native.batch.Pool('mypool',
    account_name=batch_account.name,
    vm_size="STANDARD_A1_v2", # The size of VMs in the pool, this depends on your workload.
    pool_name='mypool',
    scale_settings=azure_native.batch.PoolScaleSettingsArgs(
        auto_scale=azure_native.batch.AutoScaleSettingsArgs(
            formula="startingNumberOfVMs=1", # Scaling formula
        ),
    ),
    display_name='My AI Inference Pool',
    resource_group_name=resource_group.name)

# Step 4: Create a Batch Job on the Pool
batch_job = azure_native.batch.Job('myjob',
    pool_info=azure_native.batch.PoolInformationArgs(
        pool_id=batch_pool.pool_name,
    ),
    job_name='myaiinferencejob',
    account_name=batch_account.name,
    resource_group_name=resource_group.name)

# Step 5: Create a task to run in the Batch Job
task = azure_native.batch.Task('mytask',
    job_name=batch_job.job_name,
    task_name='myinference',
    account_name=batch_account.name,
    command_line="bash -c \"echo Hello World from the Batch AI task!\"",
    resource_group_name=resource_group.name)

# Output the created account's endpoint for further usage (like submitting jobs, etc.)
pulumi.export('batch_account_endpoint', batch_account.account_endpoint)
```

This program sets the stage for your Batch AI operations. Here's a brief overview of the steps:

- Creates a resource group to hold all related resources.
- Defines a Batch account within the resource group.
- Defines a compute pool with a single small VM and sets up auto scale that can be customized based on your need.
- Creates a new job within the Batch account, specifying the pool ID.
- Finally, we add a simple task that echoes "Hello World" (you would replace this with your specific inference task).

This program is simplified and should be customized to fit your inference workload, particularly the command line in the task definition. You would typically upload your AI model and associated data to Azure Blob Storage and pass the input data and model as parameters to your inference script that would run on the VMs.

Adjust the `vm_size`, `scale_settings`, and `command_line` to suit your application's needs. You may also look into advanced features like task dependencies, job scheduling, and comprehensive monitoring, which Azure Batch supports but are not detailed here.

You'll run this Pulumi program using the Pulumi CLI, by running `pulumi up`, after having set up Pulumi and authenticated with Azure. The Pulumi CLI will present you with a summary of the resources to be created, modified or replaced before confirming to proceed with the deployment.