1. Throttling AI Service Requests using Azure API Management Policies


    Throttling requests to an AI service in Azure is a common use case for managing an API's performance and ensuring fair usage among clients. In Azure, you achieve this by applying policies to an API Management (APIM) service. One such policy is the rate-limit or quota policy, which specifies the call rate limits and a renewal period.

    To apply a throttle policy, you create an instance of azure-native.apimanagement.Policy resource. The Policy resource allows you to define the behavior of your API Management service at global scope, product scope, or API scope. For instance, throttling is configured by specifying limits within the policy definition, written in XML format.

    Here is a detailed walkthrough of how to create an API Management service in Azure and apply a throttle policy using Pulumi with Python.

    Throttle Policy Creation with Pulumi

    Firstly, we'll import the necessary Pulumi libraries and initiate our main program. Then we'll create the API Management service instance if it does not already exist. Subsequently, we define a policy with a throttle rule.

    import pulumi import pulumi_azure_native as azure_native # Create a resource group if it does not exist resource_group = azure_native.resources.ResourceGroup("my-resource-group") # Create an API Management service instance within the resource group api_management_service = azure_native.apimanagement.ApiManagementService("my-api-management-service", resource_group_name=resource_group.name, publisher_name="My Company", publisher_email="contact@mycompany.com", sku=azure_native.apimanagement.SkuDescriptionArgs( name=azure_native.apimanagement.SkuType.DEVELOPER, # Choosing the Developer SKU for example purposes capacity=1, )) # Define the throttling policy in XML format throttle_policy_xml = """<policies> <inbound> <rate-limit calls="10" renewal-period="60" /> <quota calls="100" renewal-period="86400" /> </inbound> <backend> <forward-request /> </backend> <outbound /> </policies>""" # Apply the policy to the API Management service api_policy = azure_native.apimanagement.Policy("my-api-throttle-policy", resource_group_name=resource_group.name, service_name=api_management_service.name, value=throttle_policy_xml, format="xml") # To make the API endpoint accessible, output the URL of the API Management instance pulumi.export("api_management_endpoint", api_management_service.gateway_url)

    In this program:

    • We use the azure_native.resources.ResourceGroup class to create a new resource group or use an existing one by the name "my-resource-group".

    • We then create an API Management service using azure_native.apimanagement.ApiManagementService and specify the basic properties like the name of the publisher and the contact email. For this example, we use the "Developer" SKU type which is suitable for development and testing purposes.

    • The next step is to define a throttling policy in XML format. Here, we've set the rate-limit policy to allow 10 calls per minute (60 seconds) and a quota of 100 calls per day (86400 seconds - the number of seconds in a day).

    • This policy is applied to the API Management service instance using the azure_native.apimanagement.Policy class with the value parameter containing our XML policy and the format as "xml".

    • Finally, we expose the gateway URL of the API Management service for external access using pulumi.export.

    This Pulumi program creates and applies the policy to an Azure API Management service, effectively throttling the number of requests that clients can make as per the limits defined in the policy. Please note that this example is for demonstration purposes and should be adapted to your actual quota and rate limits requirements.

    You can find more details in the Azure API Management Policy documentation:

    Deploy this Pulumi program using the Pulumi CLI by running pulumi up inside the directory where this code is saved to provision your API Management service with the throttle policy in Azure.