Serverless Batch Inference with Azure Container Apps

Question

Pulumi · Accepted Answer

To set up a serverless batch inference using Azure Container Apps, we are going to deploy a containerized application that can process batches of data for inference. The Azure Container Apps service is suitable because it offers event-driven autoscaling, which means it can respond to the demands of the batch processing workload, scaling up when there are jobs to process and scaling down when the workload decreases.

Here's how we will set up the batch inference system:

1. **Container App Environment**: Before creating an Azure Container App, we need an environment where our app can run. This environment defines a scoped set of resources for our container apps to operate within.

2. **Container App**: We will create a container app resource. This will be the main component where our batch application runs. It should be configured with the details of the container image containing your batch inference logic. The container app will automatically scale based on demand or schedule.

3. **Revision and Ingress Settings**: When creating a Container App, we need to configure the revisions and ingress settings. Revisions are important for maintaining different versions of the app. Ingress settings help define how external traffic reaches the app, which is essential if you have a method for submitting jobs to the batch processor over HTTP.

### Requirements Before You Begin:
- Install Pulumi CLI and set up Azure credentials.
- Have a container image ready in an Azure Container Registry, or other accessible registry, with your batch inference logic. Make sure you have the image URL and credentials if the registry is private.
- Have the Azure Container Apps extension installed on your Azure account.
- Have an Azure Resource Group defined where the resources will be deployed.

### Pulumi Program - Serverless Batch Inference:

Below is the Python program that uses Pulumi with the Azure Native provider to create the required infrastructure:

```python
import pulumi
from pulumi_azure_native import containerinstance as aci
from pulumi_azure_native import resources

# Create an Azure Resource Group
resource_group = resources.ResourceGroup("batch-inference-rg")

# Create an Azure Container App Environment
container_app_environment = aci.Environment("batch-inference-env",
    resource_group_name=resource_group.name,
    location=resource_group.location,
)

# Define the container image (replace with your actual image URL and registry details)
container_image_name = "your-container-registry.azurecr.io/batch-inference:latest"

# Create an Azure Container App
container_app = aci.ContainerApp("batch-inference-app",
    resource_group_name=resource_group.name,
    container_app_environment_id=container_app_environment.id,
    template=aci.ContainerAppTemplateArgs(
        containers=[aci.ContainerArgs(
            image=container_image_name,
            resources=aci.ContainerResourcesArgs(
                cpu=1.0,
                memory_in_gb=1.5,
            ),
        )],
        scale=aci.ScaleArgs(
            min_replicas=0,
            max_replicas=10,  # Set appropriate maximum replicas based on your workload and budget
            rules=[aci.ScaleRuleArgs(
                name="HttpScaleRule",
                http=aci.HttpScaleRuleArgs(
                    metadata=aci.HttpScaleRuleMetadataArgs(
                        concurrent_requests=50,  # Number of concurrent requests to trigger a scale-out
                    ),
                ),
            )],
        ),
    ),
    identity=aci.ContainerAppIdentityArgs(type="SystemAssigned"),  # Using SystemAssigned identity for simplicity
)

# Output the FQDN of the Azure Container App
pulumi.export("fqdn", container_app.configuration.apply(lambda config: config.ingress.fqdn))
```

### Explanation of the Program:

- **Resource Group**: The `ResourceGroup` is a logical container into which Azure resources are deployed and managed. It's the foundation for organizing resources in your Azure subscription.

- **Container App Environment**: The `Environment` resource provides a context where our container apps can run. It's a prerequisite for deploying the actual Container App and is linked to a specific resource group.

- **Container App**: The `ContainerApp` is the resource that represents our serverless batch inference app. It contains the definition of the containers it will run, including the image to use, the resources allocated to it (like CPU and memory), and autoscaling rules.

- **Ingress Settings**: In this example, ingress settings are left out for simplicity. You'll need to manage these based on whether your app requires external HTTP(S) traffic or not. For batch processing, you may run jobs internally or trigger them through other Azure services (like queues or events), thus ingress might not be needed.

Make sure to replace `your-container-registry.azurecr.io/batch-inference:latest` with the actual container image path that you will use for the batch inference. If your image is in a private registry, you will need to add credentials to pull the image. In this example, we're using a public image or an image from a private registry where the container app's system-assigned identity has access.

After setting up your Pulumi stack and creating a Python file with the above content, navigate to the directory with the Pulumi file and run `pulumi up`. This command will start the deployment process of the resources you've described in the code.

Remember, this is serverless, so when there is no traffic or work for the container, it can scale down to zero, thus you'll pay only when the batch jobs are being processed. Adjust the scaling settings as needed based on concurrency, CPU, and memory requirements for your specific batch processing workloads.