Kong ClusterPlugin for Rate-Limiting on ML API Endpoints

Question

Pulumi · Accepted Answer

To implement rate limiting on ML API endpoints using Kong, you can utilize the Kong API Gateway to manage your APIs and apply policies such as rate-limiting to them. In the context of using Pulumi to set this up, you would typically require the Kong API Gateway to be deployed and accessible. Assuming that you have Kong running, you can define a `ClusterPlugin` to enforce rate limiting.

The `kong.Plugin` Pulumi resource is designed for this purpose, enabling you to configure various plugins within the Kong API Gateway, one of which is the rate-limiting plugin. This plugin can be applied at different scopes such as globally, for specific Services, Routes, or Consumers.

Here’s how you can define a Kong `Plugin` resource to apply rate-limiting on your ML API using Pulumi and Python:

```python
import pulumi
import pulumi_kong as kong

# Create a Kong Service for the ML API
ml_api_service = kong.Service("mlApiService",
    name="ml-api-service",
    protocol="http",  # Change to "https" if your API is served over HTTPS.
    host="ml-api.example.com",  # Replace with the actual host of your ML API.
    port=80,  # Change the port if your ML API is running on a different one.
)

# Define the rate limiting plugin for the ML API service
rate_limiting_plugin = kong.Plugin("rateLimitingPlugin",
    # Apply the plugin to the entire service
    service_id=ml_api_service.id,
    name="rate-limiting",
    # Configure the rate limiting plugin.
    # The following example limits the API to 100 requests per minute.
    # You can adjust the config to suit your requirements.
    config_json="""{
        "minute": 100,
        "limit_by": "consumer",
        "policy": "local"
    }""",
    enabled=True
)

# Export the plugin id
pulumi.export("rate_limiting_plugin_id", rate_limiting_plugin._id)
```

### Explanation:

- `kong.Service`: This resource represents an API or microservice available over the web that the Kong gateway will manage. Replace `"ml-api.example.com"` with the actual domain or address of your Machine Learning API endpoint.
  
- `kong.Plugin`: This is a resource provided by Pulumi to interact with Kong plugins. The `rate-limiting` plugin is used to limit the number of requests a consumer can make to an API within a specified period. This is crucial to ensure the fair usage of APIs and prevent abuse.

- `config_json`: Holds the plugin configuration in JSON format. Here, it specifies that the rate limit is 100 requests per minute, identified by the consumer. The policy used is `local` which means the rate limiting will be done locally. Kong also supports other policies like `redis` for distributed rate limiting if you are running Kong in a cluster.

- `pulumi.export`: This makes the plugin ID available as a stack output, allowing you to easily retrieve the plugin’s identifier after deployment.

By deploying this Pulumi program, you will create a Kong `Service` for your Machine Learning API and apply rate-limiting on all its endpoints. You can customize the rate limits and other settings by editing the `config_json` configuration to match your specific use case or business requirements.

Adjustments could be made based on the specifics of your ML API, your Kong deployment, and your infrastructure. This is a foundational example showing how to create Kong resources with Pulumi and apply plugins to those services for features like rate-limiting.