Real-time AI Model Serving with Azure Container Apps

Question

Pulumi · Accepted Answer

Real-time AI model serving is a scenario where you deploy a machine learning model to a cloud service such that it can receive data, process it in real-time using the model, and return a prediction or inference. For this purpose, we will use Azure Container Apps, a service that allows us to deploy containerized applications quickly and easily, scaling based on HTTP traffic or other events. This service is suitable for deploying AI models because it can handle the variability in load that is typical of real-time inference services.

Below is an outline of the steps we will follow to serve an AI model in real-time with Azure Container Apps using Pulumi:

1. **Import Required Libraries**: We will import the required Pulumi libraries for Azure.
2. **Read Configurations**: If there are specific configurations like container image or app name, we will read those.
3. **Create a Resource Group**: We will create a new resource group in Azure to organize our resources.
4. **Create a Container App Environment**: A prerequisite for container apps.
5. **Deploy the Container App**: We will define the container app specifications, including the container image that contains the AI model, environmental variables, and other settings.
6. **Export the App URL**: Export the URL of the deployed container app so that we can call the API to make real-time inferences.

Let's go ahead and write the actual Pulumi program in Python to realize the above steps.

```python
import pulumi
import pulumi_azure_native as azure_native
from pulumi_azure_native import containerapp

# 1. Import Required Libraries
# We've already imported the necessary libraries.

# 2. Read Configurations (if any)
# Assuming the configuration of the container image and names are done outside of this program.

# 3. Create a Resource Group
resource_group = azure_native.resources.ResourceGroup("ai_model_serving_rg")

# 4. Create a Container App Environment
container_app_environment = containerapp.Environment(
    "containerAppEnv",
    resource_group_name=resource_group.name,
    location=resource_group.location,
    type="Managed"
)

# 5. Deploy the Container App
ai_model_serving_app = containerapp.ContainerApp(
    "aiModelServingApp",
    resource_group_name=resource_group.name,
    container_app_environment_id=container_app_environment.id,
    configuration=containerapp.Configuration(
        ingress=containerapp.IngressConfigurationArgs(
            external=True,
            target_port=80,
            transport=containerapp.IngressTransport.Http
        ),
        secrets=[containerapp.SecretArgs(name="model-secret", value="somesecret")],
    ),
    template=containerapp.ContainerTemplateArgs(
        containers=[containerapp.ContainerArgs(
            name="ai-model-container",
            image="your-registry.azurecr.io/ai-model:latest", # Replace with your container image
            envs=[
                containerapp.EnvironmentVarArgs(
                    name="MODEL_ENDPOINT",
                    secret_ref="model-secret"
                ),
                # Define any additional environment variables here.
            ],
            resources=containerapp.ContainerResourcesArgs(
                cpu=0.5, # Define CPU requirements
                memory="1.5Gi" # Define memory requirements
            )
        )],
        scale=containerapp.ScaleArgs(
            min_replicas=2, # Define minimum number of container replicas
            max_replicas=5, # Define maximum number of container replicas
        )
    ),
    # Optional: Enable Dapr for distributed application capabilities
    # dapr=containerapp.DaprArgs(
    #     app_id="ai-model-app",
    #     app_port=80,
    #     enabled=True,
    # )
)

# 6. Export the App URL
pulumi.export("app_url", ai_model_serving_app.configuration.apply(lambda c: c.ingress.fqdn))
```

Here is what the code is doing:

- **Resource Group**: We create a new Resource Group `ai_model_serving_rg` that will keep all our Azure resources for this application organized.
- **Container App Environment**: This object creates an environment for the container app, `containerAppEnv`. It's a prerequisite to deploying the app.
- **Container App**: The `ContainerApp` resource called `aiModelServingApp` is where we deploy our application. The `configuration` sets up ingress control and secrets for environment variables. The `template` defines the container(s) that will run, their image, environment variables, and resources like CPU and memory. We also specify autoscaling parameters with `scale`.
- **Export App URL**: Finally, we export the `app_url` which is the URL to access the AI model app externally.

With this Pulumi program, an AI model service can be deployed on Azure Container Apps. It will automatically scale based on the min and max replica settings, and it's configured to be accessed from outside the Azure network.

Make sure you have the Azure Pulumi provider configured and that you replace `your-registry.azurecr.io/ai-model:latest` with the actual location and tag of your container image.