Real-time AI Model Inference with Azure Function Subscriptions

Question

Pulumi · Accepted Answer

To set up real-time AI model inference with Azure Function Subscriptions, we will create several resources using Pulumi with Azure Native provider. We'll deploy an Azure Function to serve as our real-time inference endpoint, and we'll set up an Azure Event Grid Topic with a subscription that will trigger our Azure Function. For the AI model, we will use the Azure Machine Learning service, register our model, and deploy it to an Azure Kubernetes Service (AKS) inference endpoint. Here's what we're going to do:

Create an Azure Resource Group: A logical container that holds related resources for an Azure solution.
Deploy an Azure Function App: A compute resource that enables you to run functions without having to explicitly provision and manage infrastructure.
Create an Event Grid Topic: An event routing service that uses a publish-subscribe model for uniform event consumption.
Set up an Event Grid Subscription: This subscribes to our topic and sends events to a specified endpoint, which will be our Azure Function.
Register an AI Model: Use Azure Machine Learning Services to register your pre-trained AI model.
Create an AKS: Deploy the AI model to a managed Kubernetes service where the model will be hosted.
Deploy an Inference Endpoint: Use the InferenceEndpoint resource to create an endpoint for real-time inferencing on AKS.

The following Pulumi program in Python sets up the necessary resources for this scenario. Please note that for the Azure Machine Learning model registration part, you'll need to have your model ready for registration either through direct upload or pointing to a location where the model is hosted.

import pulumi
import pulumi_azure_native as azure_native
from pulumi_azure_native import resources, eventgrid, machinelearningservices

# Create a new resource group
resource_group = resources.ResourceGroup("ai_inference_rg")

# Deploy the Azure Function App and related resources
app_service_plan = azure_native.web.AppServicePlan("ai_inference_plan",
                                                   resource_group_name=resource_group.name,
                                                   kind="FunctionApp",
                                                   sku=azure_native.web.SkuDescriptionArgs(
                                                       name="Y1",
                                                       tier="Dynamic"),
                                                   )

function_app = azure_native.web.WebApp("ai_inference_function_app",
                                       resource_group_name=resource_group.name,
                                       server_farm_id=app_service_plan.id,
                                       )

# Event Grid Topic to which the function app will subscribe for events
topic = eventgrid.Topic("ai_inference_topic",
                        resource_group_name=resource_group.name)

# Event Grid Subscription to trigger the function app
event_subscription = eventgrid.EventSubscription("ai_inference_subscription",
                                                 resource_group_name=resource_group.name,
                                                 scope=topic.id,
                                                 destination=eventgrid.WebHookEventSubscriptionDestinationArgs(
                                                     endpoint_url=function_app.default_host_name.apply(
                                                         lambda endpoint: f"https://{endpoint}/api/events")),
                                                 )

# Register an AI model using Azure Machine Learning Service
registered_model = machinelearningservices.ModelVersion("ai_model_version",
                                                        name="<MODEL_NAME>",
                                                        version="<MODEL_VERSION>",
                                                        workspace_name="<WORKSPACE_NAME>",
                                                        resource_group_name=resource_group.name,
                                                        model_version_properties=machinelearningservices.ModelVersionType(
                                                            model_type="DNN",
                                                            description="Model for real-time inference",
                                                            stage="Production"
                                                        ))

# Create an AKS cluster for deploying the AI model
aks_cluster = azure_native.containerservice.ManagedCluster("ai_inference_aks",
                                                           resource_group_name=resource_group.name,
                                                           agent_pool_profiles=[
                                                               azure_native.containerservice.ManagedClusterAgentPoolProfileArgs(
                                                                   count=1,
                                                                   vm_size="Standard_DS2_v2",
                                                                   os_type="Linux",
                                                                   name="agentpool",
                                                               )],
                                                           dns_prefix="ai-inference",
                                                           )

# Create an Inference Endpoint on AKS
inference_endpoint = machinelearningservices.InferenceEndpoint("ai_inference_endpoint",
                                                               resource_group_name=resource_group.name,
                                                               location=resource_group.location,
                                                               identity=azure_native.machinelearningservices.IdentityArgs(
                                                                   type="SystemAssigned"
                                                               ),
                                                               tags={},
                                                               properties=machinelearningservices.InferenceEndpointPropertiesArgs(
                                                                   groupId=registered_model.id,
                                                               ),
                                                               )

# Export the endpoint URL to be used for sending inference requests
pulumi.export('function_app_endpoint', function_app.default_host_name.apply(
    lambda endpoint: f"https://{endpoint}/api/inference"))
pulumi.export('aks_endpoint', inference_endpoint.properties.apply(
    lambda properties: properties["scoringUri"] or "Endpoint not available"))

This program accomplishes the following:

It sets up the base infrastructure for hosting an AI model in Azure.
It registers and serves the AI model through Azure Machine Learning Services and exposes it for real-time inference.
It creates an event-driven architecture that leverages Azure Function, Event Grid Topic, and Subscription to process events in real-time.

Once this infrastructure is deployed, your Azure Function can listen for incoming events. When an event is published to the Event Grid Topic, the Azure Function is triggered and can use the AKS-hosted AI model to perform real-time inference and return results.

Remember to replace the placeholders like <MODEL_NAME>, <MODEL_VERSION>, and <WORKSPACE_NAME> with your actual model's name, version, and Azure Machine Learning workspace name. Also, configure your Azure Function's code to match your inference needs, and the scoring URI will be dynamically generated after the deployment.

You can deploy this Pulumi program by running pulumi up in your terminal, assuming you have Azure credentials and Pulumi CLI already set up. Make sure to review the proposed infrastructure change details that Pulumi provides before confirming the deployment.