Scalable Model Deployment with Azure Machine Learning

Question

Pulumi · Accepted Answer

To deploy a scalable model with Azure Machine Learning using Pulumi, you first need an Azure Machine Learning workspace, a compute cluster to run the model, and the model itself which you'll deploy as a web service.

Here's how this process generally works:

1. **Setting up the Azure Machine Learning Workspace**: This is where all your ML assets will be stored and managed, such as compute resources, models, data, and experiments. The workspace is the top-level resource for Azure Machine Learning.

2. **Creating a Compute Cluster**: This is the compute resource that will be used to train the model and possibly also serve it. It should be created within the aforementioned workspace and should be configured to scale according to the workload.

3. **Registering the Model**: Once your model is trained, you will need to register it within your Azure Machine Learning workspace. This step involves uploading the model file, which makes it available for deployment.

4. **Deploying the Model as a Web Service**: Finally, the registered model can be deployed as a web service on the compute cluster you created. This allows for scalable, real-time inference.

Let's write a Pulumi program to achieve this.

```python
import pulumi
import pulumi_azure_native as azure_native

# Create an Azure Machine Learning Workspace
workspace = azure_native.machinelearningservices.Workspace(
    "workspace",
    resource_group_name=pulumi.config.require("resource_group_name"),
    sku=azure_native.machinelearningservices.SkuArgs(
        name="Standard",
    ),
    location=pulumi.config.require("location"),
    workspace_name="my-ml-workspace",
)

# Create a Compute Cluster within the Machine Learning Workspace
compute_cluster = azure_native.machinelearningservices.Compute(
    "computeCluster",
    resource_group_name=pulumi.config.require("resource_group_name"),
    workspace_name=workspace.name,
    compute_name="my-compute-cluster",
    properties=azure_native.machinelearningservices.ComputeInstanceSchema(
        compute_type="AmlCompute",
        properties=azure_native.machinelearningservices.AmlComputeSchema(
            vm_size="STANDARD_NC6",
            vm_priority="LowPriority",
            scale_settings=azure_native.machinelearningservices.ScaleSettingsSchema(
                min_node_count=0,
                max_node_count=10,
            ),
        ),
    ),
)

# Assuming the model has been trained and the file 'model.pkl' is available.
# Register the Model within the Machine Learning Workspace.
model = azure_native.machinelearningservices.Model(
    "model",
    resource_group_name=pulumi.config.require("resource_group_name"),
    workspace_name=workspace.name,
    model_name="my-trained-model",
    properties=azure_native.machinelearningservices.ModelSchema(
        model_uri="path/to/model.pkl",  # Replace this with the path to your model file.
        description="My trained model",
        frameworks=azure_native.machinelearningservices.FrameworkSchema(
            name="ScikitLearn",
            version="0.23.2",
        ),
    ),
)

# Deploy the model as a web service on the compute cluster
deployment = azure_native.machinelearningservices.Endpoint(
    "endpoint",
    resource_group_name=pulumi.config.require("resource_group_name"),
    workspace_name=workspace.name,
    endpoint_name="my-model-endpoint",
    properties=azure_native.machinelearningservices.EndpointSchema(
        compute_type="AmlCompute",
        properties=azure_native.machinelearningservices.RealtimeEndpointSchema(
            target_compute_name=compute_cluster.name,
            scoring_uri="http://my-model-endpoint/score",
            model_id=model.id,
            environment_id="AzureML-sklearn-0.23.2"  # This should correspond with your model's framework and version.
        ),
    ),
)

# Export the web service URI so you know where to send your inference requests
pulumi.export("endpoint_uri", deployment.properties.apply(lambda properties: properties["scoring_uri"]))
```

Explanation of the program:

- We start by importing the required packages and creating an Azure Machine Learning workspace.
- Next, we create a compute cluster named `my-compute-cluster` within this workspace with autoscaling enabled. It can scale from 0 to 10 nodes based on demand.
- We then register the model file (`model.pkl`) to our workspace. Here you must replace the `model_uri` with the actual location of your trained model file.
- After registration, we deploy the model as an endpoint on the compute cluster that can be used as a web service.
- Finally, we export the endpoint URI which you can use to send inference requests to your deployed model.

Please replace the values of `workspace_name`, `compute_name`, `model_uri`, `scoring_uri`, and any other placeholders with actual values from your environment setup. Also, ensure that you've already set up your Azure configuration and selected the correct subscription before running this Pulumi program.