1. API Throttling Rules for AI Workloads with Named Values


    API throttling is an important aspect of managing an application's backend, especially for AI workloads that may send a large number of requests to your API. Throttling rules help ensure that your API can handle these requests gracefully without overwhelming your backend systems or degrading the user experience.

    In this context, we will create API throttling rules using Pulumi with Azure API Management service. The Azure API Management service allows you to publish, manage, secure, and analyze your APIs in a scalable environment. Within this service, you can define policies that specify throttling rules. Additionally, you can use Named Values (also known as properties or secrets) in Azure API Management to maintain and use reusable configuration values across your API.

    The azure-native.apimanagement.ApiManagementService resource is a logical representation of our Azure API Management service instance. This service will need a policy defined that declares the API throttling rules.

    The azure-native.apimanagement.Policy resource represents a policy that can include various rules and behaviors, such as rate limits, that apply to the scope of the API service.

    The azure-native.apimanagement.NamedValue resource is used to create a Named Value that can be used within a policy to avoid hardcoding values.

    Here is an example program that sets up API throttling rules using Pulumi with Azure:

    import pulumi import pulumi_azure_native as azure_native # Create an API Management Service. api_management_service = azure_native.apimanagement.ApiManagementService("apiManagementService", resource_group_name="myResourceGroup", location="West US", publisher_name="My Company", publisher_email="publisher@example.com", sku=azure_native.apimanagement.SkuDescriptionArgs( name="Developer", # Choosing the Developer tier for demonstration purposes. capacity=1, ) ) # Create a Named Value that will hold our throttle limit which can be reused in policies. throttle_limit_named_value = azure_native.apimanagement.NamedValue("throttleLimitNamedValue", api_management_name=api_management_service.name, resource_group_name="myResourceGroup", display_name="ThrottleLimit", value="10", # The value set here could be the number of calls per minute for instance. secret=False, ) # Define a policy with a rate limit to throttle API calls. policy = azure_native.apimanagement.Policy("policy", resource_group_name="myResourceGroup", service_name=api_management_service.name, format="xml", value=f"""<policies> <inbound> <rate-limit calls="@(int({throttle_limit_named_value.name}))" renewal-period="60" /> <quota-by-key calls="100" renewal-period="86400" counter-key="@{{request.header.oauth-token}}" /> </inbound> </policies>""" ) # Export the API endpoint. pulumi.export("apiManagementEndpoint", api_management_service.gateway_url)

    In the above program:

    • We created an instance of the API Management Service where our APIs will be managed.
    • We defined a Named Value to store our throttle limit, making it easier to reference and update independently of the policies.
    • We applied a policy to our service that includes a rate-limit rule, parameterized by the Named Value we created earlier.

    The rate-limit element within the policy configuration is what sets the throttling behavior. It's set to allow a certain number of calls (calls) within a specified period in seconds (renewal-period), which effectively limits the frequency of calls to our API.

    We also defined a quota-by-key which is an optional rule that can be used to set daily call quotas based on a key, in this case, an OAuth token in the request header. This is useful for setting different limits for different consumers of your APIs.

    By exporting the API management service endpoint, we make it available to be used by clients that will access the APIs hosted on this service.

    This setup is a starting point for managing API throttling and can be adapted to suit the specific requirements of your AI workloads and API usage patterns. Depending on the expected load and performance targets, you would adjust the throttling limits accordingly.