Centralized Rate Limiting for ML API Gateways

Question

Pulumi · Accepted Answer

To achieve centralized rate limiting for Machine Learning (ML) API gateways, the requirement is to set up a system where requests to the API are throttled to prevent overuse and ensure fair usage. This setup is often essential when deploying ML models as a service because it helps to manage computational resources effectively and maintain the quality of service as the usage scales.

Pulumi allows you to define infrastructure as code, which can be used to create and configure API gateways with rate limiting rules. We will explore how to do this using AWS as an example. AWS API Gateway lets you create, publish, maintain, monitor, and secure APIs. It includes features for rate limiting and throttling, which can be configured using Pulumi.

In the following Pulumi Python program, we will create an AWS API Gateway with usage plans and rate limiting configured. Usage plans are a key feature of AWS API Gateway that help you to control how and who can access your APIs. By defining and associating a usage plan with an API stage and a client ID (generated by API Gateway or external), you can throttle the API usage to the specified limits.

Here's a program that creates an API Gateway with a usage plan which defines these limits:

```python
import pulumi
import pulumi_aws as aws

# Create an API Gateway REST API.
api_gateway = aws.apigateway.RestApi("mlApiGateway",
    description="API Gateway for ML services with rate limiting")

# Define the deployment stage of the API Gateway.
stage = aws.apigateway.Stage("mlStage",
    deployment=api_gateway.deployments[0].id,
    rest_api=api_gateway.id,
    stage_name="prod")

# Create a usage plan to define and associate the throttle and quota limits for the API Gateway.
usage_plan = aws.apigateway.UsagePlan("mlUsagePlan",
    description="A usage plan that sets rate limit and quota for the API",
    api_stages=[
        aws.apigateway.UsagePlanApiStageArgs(
            api_id=api_gateway.id,
            stage=stage.stage_name,
            throttle=[aws.apigateway.UsagePlanThrottleArgs(
                burst_limit=5,   # Maximum burst for rate limit.
                rate_limit=10    # Steady-state requests per second.
            )]
        )
    ],
    throttle={
        "rateLimit": 2,     # Average requests per second over a time ranging from one to a few minutes.
        "burstLimit": 1      # Maximum requests that are allowed in a single burst.
    },
    quota={
        "limit": 1000,      # Maximum number of requests that can be made in a defined time period.
        "period": "WEEK",   # Time period in which the limit applies (DAY, WEEK, or MONTH).
        "offset": 2         # Number of hours to offset the start of the period.
    })

# Export the API Gateway endpoint URL for easy access.
pulumi.export("api_endpoint", api_gateway.deployments[0].invoke_url)

# Export the ID of the usage plan.
pulumi.export("usage_plan_id", usage_plan.id)
```

In the above program, we start by creating a REST API using AWS's `apigateway.RestApi` resource. Then, we create a stage for the deployment using `apigateway.Stage`. The stage represents the snapshot of the API that will be invokeable by users.

The key part for rate limiting is the `apigateway.UsagePlan`. Here we specify an array of `aws.apigateway.UsagePlanApiStageArgs` to associate the stage we created earlier with this usage plan and define throttling rules with `aws.apigateway.UsagePlanThrottleArgs`.

Finally, we define the usage plan's throttle and quota. The `rateLimit` argument configures the average number of requests per second over a longer time period, while `burstLimit` allows for bursts above the rate limit. The quota limits the number of requests in a defined time period, which we've set to 1000 requests per week with an offset of 2 hours.

This Pulumi program provides centralized rate limiting for your ML API Gateway, ensuring that your service can handle the load efficiently and fairly. You would need to deploy your ML model and create appropriate resources and methods in the API Gateway to make this fully functional, which could be part of a larger Pulumi program.