Throttling AI Service Requests using Azure API Management Policies

Question

Pulumi · Accepted Answer

Throttling requests to an AI service in Azure is a common use case for managing an API's performance and ensuring fair usage among clients. In Azure, you achieve this by applying policies to an API Management (APIM) service. One such policy is the _rate-limit_ or _quota_ policy, which specifies the call rate limits and a renewal period. To apply a throttle policy, you create an instance of `azure-native.apimanagement.Policy` resource. The `Policy` resource allows you to define the behavior of your API Management service at global scope, product scope, or API scope. For instance, throttling is configured by specifying limits within the policy definition, written in XML format. Here is a detailed walkthrough of how to create an API Management service in Azure and apply a throttle policy using Pulumi with Python. ### Throttle Policy Creation with Pulumi Firstly, we'll import the necessary Pulumi libraries and initiate our main program. Then we'll create the API Management service instance if it does not already exist. Subsequently, we define a policy with a throttle rule. ```python import pulumi import pulumi_azure_native as azure_native # Create a resource group if it does not exist resource_group = azure_native.resources.ResourceGroup("my-resource-group") # Create an API Management service instance within the resource group api_management_service = azure_native.apimanagement.ApiManagementService("my-api-management-service", resource_group_name=resource_group.name, publisher_name="My Company", publisher_email="contact@mycompany.com", sku=azure_native.apimanagement.SkuDescriptionArgs( name=azure_native.apimanagement.SkuType.DEVELOPER, # Choosing the Developer SKU for example purposes capacity=1, )) # Define the throttling policy in XML format throttle_policy_xml = """

""" # Apply the policy to the API Management service api_policy = azure_native.apimanagement.Policy("my-api-throttle-policy", resource_group_name=resource_group.name, service_name=api_management_service.name, value=throttle_policy_xml, format="xml") # To make the API endpoint accessible, output the URL of the API Management instance pulumi.export("api_management_endpoint", api_management_service.gateway_url) ``` In this program: - We use the `azure_native.resources.ResourceGroup` class to create a new resource group or use an existing one by the name "my-resource-group". - We then create an API Management service using `azure_native.apimanagement.ApiManagementService` and specify the basic properties like the name of the publisher and the contact email. For this example, we use the "Developer" SKU type which is suitable for development and testing purposes. - The next step is to define a throttling policy in XML format. Here, we've set the `rate-limit` policy to allow 10 calls per minute (60 seconds) and a `quota` of 100 calls per day (86400 seconds - the number of seconds in a day). - This policy is applied to the API Management service instance using the `azure_native.apimanagement.Policy` class with the `value` parameter containing our XML policy and the `format` as "xml". - Finally, we expose the gateway URL of the API Management service for external access using `pulumi.export`. This Pulumi program creates and applies the policy to an Azure API Management service, effectively throttling the number of requests that clients can make as per the limits defined in the policy. Please note that this example is for demonstration purposes and should be adapted to your actual quota and rate limits requirements. You can find more details in the Azure API Management Policy documentation: - [Azure API Management Policy - azure-native.apimanagement.Policy](https://www.pulumi.com/registry/packages/azure-native/api-docs/apimanagement/policy/) Deploy this Pulumi program using the Pulumi CLI by running `pulumi up` inside the directory where this code is saved to provision your API Management service with the throttle policy in Azure.