Kong Plugin for API Gateway in ML Microservices Architecture

Question

Pulumi · Accepted Answer

If you're building a microservices architecture for machine learning (ML) and you're looking to incorporate Kong as your API Gateway, you'll want to define a system where Kong is managing your API endpoints, possibly performing authentication, rate-limiting, transformations, and more. The plugins you apply to your Kong services or routes allow you to dynamically extend and control the behavior of your API requests.

In this Pulumi program, we'll focus on setting up a basic example of a Kong API Gateway with a plugin. To keep things simple, we'll assume you've got services running behind Kong, and we'll add a Kong Plugin to a Service. This could be useful, for instance, if you're running an inference service for your ML workloads and you want to enforce certain API behaviors.

In this setup, we will:

- Create a Kong Service: Represents your backend service.
- Add a Kong Plugin to the Service: To modify the behavior of requests to your ML service.

Let's get started with a Pulumi program in Python for this setup.

```python
import pulumi
import pulumi_kong as kong

# Create a new Kong service representing your ML microservice.
ml_service = kong.Service("mlService",
    name="ml-service",  # The name of your service (e.g., machine learning service)
    protocol="http",    # The protocol your service is using (e.g., `http`, `https`).
    host="example.com", # The host where the service is running.
    port=80             # The port where the service is exposed.
)

# Add a rate-limiting plugin to the service we just created.
# This is useful if you want to limit the number of requests a client can make to your ML service.
rate_limiting_plugin = kong.Plugin("rateLimitingPlugin",
    name="rate-limiting",       # The name of the plugin to use.
    service_id=ml_service.id,   # Associate the plugin with the service we created earlier.
    config_json="""{            # Configuration for the rate-limiting plugin.
        "second": 5,            # Number of HTTP requests that can be made per second.
        "hour": 10000           # Number of HTTP requests that can be made per hour.
    }"""
)

# Export the service and plugin ID for later reference.
pulumi.export("ml_service_id", ml_service.id)
pulumi.export("rate_limiting_plugin_id", rate_limiting_plugin.id)
```

In this program:

- We create a `Service` resource that represents your backend ML service. It is configured with the necessary parameters like name, protocol, host, and port. This backend could be running your machine learning models waiting for inference requests.
- We then create a `Plugin` resource called `rateLimitingPlugin`. This plugin is of the type `rate-limiting`, a common plugin offered by Kong for controlling the rate at which clients can make requests to your services. We associate this plugin with our `ml_service` via `service_id`.
- Finally, we export the IDs of both the service and the plugin, which can be useful if you need to reference them later on in your infrastructure stack.

This setup helps you place Kong as a central component in front of your ML microservices, enabling you to manage and monitor the traffic flowing through your APIs with plugins.

Remember, this is a very basic setup. In a production environment, you would also need to set up other Kong configurations and deploy this in conjunction with your ML services running on a cloud provider such as AWS, GCP, or Azure. You also might have different environments (staging, production) with different configurations, and for all of this, Pulumi can help you to maintain consistency across your environments.