Private DNS for Secure AI Model Serving Endpoints

Question

Pulumi · Accepted Answer

In order to create private DNS zones which could be used for serving AI models securely, you may typically use a cloud provider's DNS and networking services to create a private environment in which your AI endpoints can be hosted. The DNS service would manage the domain names for your endpoints within a private network, ensuring that they are only accessible from authorized networks or services.

For the purpose of creating such infrastructure using Pulumi, you would likely use a combination of resources from the specific cloud provider you’re using for hosting your AI services. Let’s take Azure as an example and imagine we want to create and secure Azure Machine Learning endpoints. This would involve the following steps:

1. Creating a private DNS zone within Azure to manage the custom domain names for the endpoints.
2. Creating an endpoint within the Azure Machine Learning service.
3. Configuring the network to restrict access to the endpoint to only certain sources, using services such as Azure Private Endpoints.

Below is a detailed Pulumi Python program that accomplishes the tasks:

- **azure-native.network.PrivateZone**: The private DNS zone where the DNS records for the AI model serving endpoints will be created. Private DNS zones enable you to use your own custom domain names rather than the Azure-provided domain names. These zones are ideal for security and isolation as they are not resolvable over the internet.

- **azure-native.machinelearningservices.InferenceEndpoint**: This resource is used to deploy, manage, and score machine learning models in Azure. The models are hosted in a secure and scalable manner.

- **azure-native.network.PrivateEndpoint**: A private endpoint is a network interface that connects you privately and securely to a service powered by Azure Private Link. Private endpoint uses a private IP address from your VNet, effectively bringing the service into your VNet.

Here's the program:

```python
import pulumi
import pulumi_azure_native as azure_native

# Replace these variables with the appropriate values for your environment
resource_group_name = "my-ai-resources"
private_dns_zone_name = "ai-models.example.com"
location = "East US"
workspace_name = "my-ai-workspace"
endpoint_name = "my-inference-endpoint"

# Create the resource group
resource_group = azure_native.resources.ResourceGroup('resourceGroup',
    resource_group_name=resource_group_name)

# Create a private DNS zone
private_dns_zone = azure_native.network.PrivateZone('privateDnsZone',
    resource_group_name=resource_group.name,
    private_zone_name=private_dns_zone_name,
    location="Global")

# Create the machine learning workspace
ml_workspace = azure_native.machinelearningservices.Workspace('mlWorkspace',
    resource_group_name=resource_group.name,
    location=location,
    workspace_name=workspace_name)

# Create an inference endpoint within the machine learning workspace
inference_endpoint = azure_native.machinelearningservices.InferenceEndpoint('inferenceEndpoint',
    resource_group_name=resource_group.name,
    workspace_name=ml_workspace.name,
    endpoint_name=endpoint_name,
    identity=azure_native.machinelearningservices.IdentityArgs(
        type="SystemAssigned"
    ),
    location=location,
    sku=azure_native.machinelearningservices.SkuArgs(
        name="Basic"
    ))

# Link private DNS zone to the virtual network of the machine learning workspace
# For this step, you would need to have the VNet information, assuming VNet is already created along the workspace
# and its ID is known. Replace 'your_vnet_id' with the actual resource ID of your virtual network.
private_dns_zone_vnet_link = azure_native.network.VirtualNetworkLink('privateDnsZoneVnetLink',
    resource_group_name=resource_group.name,
    private_zone_name=private_dns_zone.name,
    virtual_network_id="your_vnet_id", # Replace with actual VNet ID
    virtual_network_link_name="my-vnet-link",
    registration_enabled=True)

# Create a private endpoint for the inference endpoint
private_endpoint = azure_native.network.PrivateEndpoint('privateEndpoint',
    resource_group_name=resource_group.name,
    location=location,
    private_endpoint_name="my-endpoint",
    private_link_service_connections=[azure_native.network.PrivateLinkServiceConnectionArgs(
        name=endpoint_name,
        private_link_service_id=inference_endpoint.id,
        group_ids=["model"],  # The group ID for the Azure Machine Learning service
    )],
    subnet=azure_native.network.SubnetArgs(id="your_subnet_id"))  # Replace with your subnet ID

# Export the endpoint details
pulumi.export("private_dns_zone_id", private_dns_zone.id)
pulumi.export("inference_endpoint_id", inference_endpoint.id)
pulumi.export("private_endpoint_id", private_endpoint.id)
```

In this program:

- We create a resource group which is a container that holds related resources for an Azure solution.
- We set up a private DNS zone that is used within or across one or multiple virtual networks in Azure.
- We make a machine learning workspace where all the components and assets of machine learning are stored and managed.
- We then create an inference endpoint, a scalable and secure location to deploy and manage machine learning models.
- We link the private DNS zone to the virtual network associated with the workspace to ensure it can resolve the names within it.
- Finally, we create a private endpoint which isolates and securely connects with services over the Azure backbone network.

This program assumes you have certain resources (like the VNet and Subnet) already configured. You would need to replace the placeholders (e.g., `'your_vnet_id'`, `'your_subnet_id'`) with your actual resource IDs. The `pulumi.export` statements at the end output the created resource IDs for your reference.

Upon deploying this Pulumi program, it will create secure and private connectivity to your AI model serving endpoints using the specified private DNS zone, which will not be exposed to the public internet. This ensures that the endpoints are only accessible within the approved and secure virtual network setup.

Please ensure to replace placeholder values and configure any necessary permissions or additional settings to meet the security requirements of your application and organization.