Hosting Inference APIs on Azure App Service

Question

Pulumi · Accepted Answer

To host an inference API on Azure App Service, you will need to set up an Azure App Service and App Service Plan. Azure App Service is a fully managed platform for building, deploying, and scaling web apps. You can use it to host a web API that serves machine learning model inferences.

Here's what we'll do in this program:

1. Create an Azure Resource Group, which acts as a logical container for our Azure resources.
2. Set up an Azure App Service Plan, which defines the underlying VM that your app runs on and manages the scaling of your app.
3. Create an Azure App Service, to deploy and run the inference API.

Below is the Python program for deploying an inference API using the Pulumi Azure provider:

```python
import pulumi
import pulumi_azure_native as azure_native

# Create an Azure Resource Group
resource_group = azure_native.resources.ResourceGroup('resource_group')

# Set up an Azure App Service Plan (this example uses a B1 Basic tier which you can change as needed)
app_service_plan = azure_native.web.AppServicePlan('app_service_plan',
    resource_group_name=resource_group.name,
    location=resource_group.location,
    sku=azure_native.web.SkuDescriptionArgs(
        name="B1",
        tier="Basic",
        size="B1",
        family="B",
        capacity=1
    ),
    kind='App',
    reserved=False  # This determines whether you run on Windows (False) or Linux (True). Set this based on your needs.
)

# Create an Azure App Service
app_service = azure_native.web.WebApp('app_service',
    resource_group_name=resource_group.name,
    location=resource_group.location,
    server_farm_id=app_service_plan.id,
    https_only=True,  # Redirects all HTTP traffic to HTTPS
    site_config=azure_native.web.SiteConfigArgs(
        app_settings=[
            # Here, you can pass configuration such as connection strings and other environment variables
            azure_native.web.NameValuePairArgs(name="WEBSITE_RUN_FROM_PACKAGE", value="1")
            # "WEBSITE_RUN_FROM_PACKAGE" is an example app setting for running the app from a deployment package (zip file)
            # You would include other necessary configuration settings relevant to your inference API
        ]
    )
)

# Export the primary endpoint for the app service, which is the URL of the App
pulumi.export('endpoint', pulumi.Output.concat('https://', app_service.default_host_name))
```

In the above program:

- We start by importing the `pulumi` and the `pulumi_azure_native` library which contains all the resources you need to interact with Azure.
- A `ResourceGroup` is initialized, which creates a new resource group where all our resources will live.
- An `AppServicePlan` is defined with a specific SKU (size and tier) to allocate for our app. In our example, we use the B1 Basic tier, which is cost-effective and suitable for a small scale production API. The `reserved` flag is set to `False`, as we're assuming a .NET or Node.js app service running on Windows. If you have a Docker container or a Linux app, this should be set to `True`.
- We create the `WebApp` resource, which is our App Service. The configuration of the WebApp includes setting `https_only` to `True`, which ensures that all un-secure HTTP requests are redirected to HTTPS. In `site_config`, we set an app setting `WEBSITE_RUN_FROM_PACKAGE` to "1" as an example to indicate that the app should run from a package. This is a common setting for App Services that run APIs for inference. You will need to add your specific settings required for your API to work.
- We then export the `endpoint`, which provides the URL endpoint of our deployed API. This is how you'd interact with your inference API once it's up and running.

This program provides a robust starting point for deploying a web service that can host your inference API. Please make sure you have the Azure CLI configured with the correct account and Pulumi setup to run the code.