Throttling and Monitoring AI API Usage with AWS

Question

Pulumi · Accepted Answer

When building an AI API on AWS, you’ll want to manage the throttling to prevent overuse of the API that might degrade its performance, and to monitor the usage to analyze and optimize how your API is being utilized.

We will construct a simple system using Pulumi with AWS that sets up an API Gateway with usage plans and API keys. This will help to throttle the usage and monitor the performance of the API.

1. **API Gateway RestAPI**: This will define the entry point for your AI API.
2. **API Gateway Deployment and Stage**: To deploy the API and specify the stage, we will apply throttling settings at the stage level.
3. **API Gateway Usage Plan**: Here, we will attach the stage to a usage plan that specifies a quota limit and a rate limit.
4. **API Gateway API Key**: Keys created to access the API will be associated with the usage plan.

With this setup, you can control and monitor API usage effectively. Rate limits and quotas can be enforced, API keys can be distributed to clients for controlled access, and monitoring can be set up to examine API calls and performance. Below is a Pulumi program in Python that demonstrates how to accomplish all these steps:

```python
import pulumi
import pulumi_aws as aws

# Creating an API Gateway RestAPI
rest_api = aws.apigateway.RestApi("ai_api",
    description="AI API for throttling and monitoring",
)

# Creating a Resource within the API
resource = aws.apigateway.Resource("ai_api_resource",
    parent_id=rest_api.root_resource_id,
    path_part="endpoint",
    rest_api=rest_api.id,
)

# Creating a GET method for the API resource
method = aws.apigateway.Method("ai_api_method",
    http_method="GET",
    resource_id=resource.id,
    rest_api=rest_api.id,
    authorization="NONE",
)

# Deploying the API gateway
deployment = aws.apigateway.Deployment("ai_api_deployment",
    rest_api=rest_api.id,
    opts=pulumi.ResourceOptions(depends_on=[method]),
)

# Defining a stage which is associated with the deployment
stage = aws.apigateway.Stage("ai_api_stage",
    deployment=deployment.id,
    rest_api=rest_api.id,
    stage_name="v1",
    # Applying rate limiting (throttling) settings at the stage level
    xray_tracing_enabled=True,  # Enable X-Ray Tracing for monitoring
    throttle_settings=aws.apigateway.StageThrottleSettingsArgs(
        burst_limit=5,  # Defines the maximum rate limit over a time ranging from a few milliseconds to a few seconds
        rate_limit=10,  # The steady-state request rate limit
    ),
)

# Creating an API Key for accessing the API
api_key = aws.apigateway.ApiKey("api_key",
    description="API Key for AI API",
)

# Setting up an Usage Plan for the API
usage_plan = aws.apigateway.UsagePlan("ai_api_usage_plan",
    name="ai_usage_plan",
    description="Usage plan for controlling and monitoring AI API usage",
    api_stages=[aws.apigateway.UsagePlanApiStageArgs(
        api_id=rest_api.id,
        stage=stage.stage_name,
    )],
    # Quota settings limit the number of requests that can be made in a given time period
    quota_settings=aws.apigateway.UsagePlanQuotaSettingsArgs(
        limit=1000,  # The maximum number of requests that can be made in a period
        period="WEEK",  # The period in which the limit applies ('DAY', 'WEEK', or 'MONTH')
    ),
    # Throttle settings limit the request rate and burst capacity
    throttle_settings=aws.apigateway.UsagePlanThrottleSettingsArgs(
        burst_limit=5,
        rate_limit=10,
    ),
)

# Associating the API key with the usage plan
usage_plan_key = aws.apigateway.UsagePlanKey("api_key_usage_plan_key",
    key_id=api_key.id,
    key_type="API_KEY",
    usage_plan_id=usage_plan.id,
)

# Exporting the invoked URL for the AI API
pulumi.export("invoke_url", pulumi.Output.concat("https://", rest_api.id, ".execute-api.", aws.get_region().name, ".amazonaws.com/", stage.stage_name, "/endpoint"))
```

In the above code:
- We create an API Gateway to serve as the front door for your AI API requests.
- We create a usage plan to enforce quota (the total number of requests that can be made over a given time period) and rate limits (the steady-state rate at which requests can be made).
- We create an API key, which is used to monitor and control access to your API. The key is associated with the usage plan to enforce the specified limits.
- We export the URL through which the API can be invoked.

Each resource creation is accompanied by descriptive comments to help you understand their purpose. Additionally, the values used for throttling and monitoring can be adjusted to fit the expected traffic patterns and performance requirements of your AI API. Remember to review AWS costs associated with API Gateway usage, as monitoring and logging can lead to additional charges.