Real-time AI Model Serving with Azure Container Apps
PythonReal-time AI model serving is a scenario where you deploy a machine learning model to a cloud service such that it can receive data, process it in real-time using the model, and return a prediction or inference. For this purpose, we will use Azure Container Apps, a service that allows us to deploy containerized applications quickly and easily, scaling based on HTTP traffic or other events. This service is suitable for deploying AI models because it can handle the variability in load that is typical of real-time inference services.
Below is an outline of the steps we will follow to serve an AI model in real-time with Azure Container Apps using Pulumi:
- Import Required Libraries: We will import the required Pulumi libraries for Azure.
- Read Configurations: If there are specific configurations like container image or app name, we will read those.
- Create a Resource Group: We will create a new resource group in Azure to organize our resources.
- Create a Container App Environment: A prerequisite for container apps.
- Deploy the Container App: We will define the container app specifications, including the container image that contains the AI model, environmental variables, and other settings.
- Export the App URL: Export the URL of the deployed container app so that we can call the API to make real-time inferences.
Let's go ahead and write the actual Pulumi program in Python to realize the above steps.
import pulumi import pulumi_azure_native as azure_native from pulumi_azure_native import containerapp # 1. Import Required Libraries # We've already imported the necessary libraries. # 2. Read Configurations (if any) # Assuming the configuration of the container image and names are done outside of this program. # 3. Create a Resource Group resource_group = azure_native.resources.ResourceGroup("ai_model_serving_rg") # 4. Create a Container App Environment container_app_environment = containerapp.Environment( "containerAppEnv", resource_group_name=resource_group.name, location=resource_group.location, type="Managed" ) # 5. Deploy the Container App ai_model_serving_app = containerapp.ContainerApp( "aiModelServingApp", resource_group_name=resource_group.name, container_app_environment_id=container_app_environment.id, configuration=containerapp.Configuration( ingress=containerapp.IngressConfigurationArgs( external=True, target_port=80, transport=containerapp.IngressTransport.Http ), secrets=[containerapp.SecretArgs(name="model-secret", value="somesecret")], ), template=containerapp.ContainerTemplateArgs( containers=[containerapp.ContainerArgs( name="ai-model-container", image="your-registry.azurecr.io/ai-model:latest", # Replace with your container image envs=[ containerapp.EnvironmentVarArgs( name="MODEL_ENDPOINT", secret_ref="model-secret" ), # Define any additional environment variables here. ], resources=containerapp.ContainerResourcesArgs( cpu=0.5, # Define CPU requirements memory="1.5Gi" # Define memory requirements ) )], scale=containerapp.ScaleArgs( min_replicas=2, # Define minimum number of container replicas max_replicas=5, # Define maximum number of container replicas ) ), # Optional: Enable Dapr for distributed application capabilities # dapr=containerapp.DaprArgs( # app_id="ai-model-app", # app_port=80, # enabled=True, # ) ) # 6. Export the App URL pulumi.export("app_url", ai_model_serving_app.configuration.apply(lambda c: c.ingress.fqdn))
Here is what the code is doing:
- Resource Group: We create a new Resource Group
ai_model_serving_rg
that will keep all our Azure resources for this application organized. - Container App Environment: This object creates an environment for the container app,
containerAppEnv
. It's a prerequisite to deploying the app. - Container App: The
ContainerApp
resource calledaiModelServingApp
is where we deploy our application. Theconfiguration
sets up ingress control and secrets for environment variables. Thetemplate
defines the container(s) that will run, their image, environment variables, and resources like CPU and memory. We also specify autoscaling parameters withscale
. - Export App URL: Finally, we export the
app_url
which is the URL to access the AI model app externally.
With this Pulumi program, an AI model service can be deployed on Azure Container Apps. It will automatically scale based on the min and max replica settings, and it's configured to be accessed from outside the Azure network.
Make sure you have the Azure Pulumi provider configured and that you replace
your-registry.azurecr.io/ai-model:latest
with the actual location and tag of your container image.