Horizontal Pod Autoscaling with Prometheus Metrics

Question

Pulumi · Accepted Answer

Horizontal Pod Autoscaling (HPA) is a feature in Kubernetes that allows you to automatically scale the number of pods in a deployment, stateful set, or replica set based on observed CPU utilization or other select metrics. When you incorporate Prometheus metrics, you can extend the HPA functionality by scaling based on the custom metrics that Prometheus collects, which gives you a more fine-grained control over the scaling behavior of your application.

In the following Pulumi program written in Python, we will set up a Kubernetes Horizontal Pod Autoscaler that scales based on custom Prometheus metrics. We will use a `HorizontalPodAutoscaler` from the Kubernetes package (`kubernetes.autoscaling.v2beta2.HorizontalPodAutoscaler`) that allows you to specify custom metrics within the `metrics` specification. Here's how you could write such a Pulumi program:

```python
import pulumi
import pulumi_kubernetes as kubernetes

# Configure a Kubernetes provider (assumes kubectl is configured to point to your cluster)
provider = kubernetes.Provider('myprovider')

# Define a reference to your Deployment, StatefulSet, or other scale target
# Replace 'your-deployment-name' with the name of your actual deployment:
scale_target = kubernetes.apps.v1.Deployment.get('my-scale-target',
                                                  'your-deployment-name')

# Define the Prometheus metric you want to scale on
# Replace 'http_requests_total' with the name of your actual Prometheus metric,
# and adjust 'threshold' and 'additional labels' as necessary:
prometheus_metric = {
    "type": "Object",
    "object": {
        "describedObject": {
            "apiVersion": "apps/v1",
            "kind": "Deployment",
            "name": scale_target.metadata.name
        },
        "metric": {
            "name": "http_requests_total"
        },
        "target": {
            "type": "Value",
            "value": "100"
        }
    }
}

# Create a HorizontalPodAutoscaler using the custom Prometheus metric
hpa = kubernetes.autoscaling.v2beta2.HorizontalPodAutoscaler(
    'my-hpa',
    spec=kubernetes.autoscaling.v2beta2.HorizontalPodAutoscalerSpecArgs(
        scale_target_ref=kubernetes.autoscaling.v2beta2.CrossVersionObjectReferenceArgs(
            api_version='apps/v1',
            kind='Deployment',
            name=scale_target.metadata.name,
        ),
        min_replicas=1,
        max_replicas=10,
        metrics=[prometheus_metric],
    ),
    opts=pulumi.ResourceOptions(provider=provider)
)

# Export the name of the HPA for reference purposes
pulumi.export('hpa_name', hpa.metadata.name)
```

In this program:

- We import the required modules and set up a Kubernetes provider. This provider is configured to interact with your Kubernetes cluster, so make sure that `kubectl` is set up correctly to point to your Kubernetes cluster.
- We retrieve a reference to the Deployment (or replace this with StatefulSet or other workload controllers) that we want to scale. You should replace `'your-deployment-name'` with the actual name of the Deployment you intend to manage with HPA.
- We define the Prometheus metric we want to scale on. You will need to replace `'http_requests_total'` with your actual Prometheus metric name. The `target.value` should be set to the threshold at which you want to trigger scaling.
- We create an instance of `HorizontalPodAutoscaler` resource with the custom metric included. The `min_replicas` and `max_replicas` are set to define the minimum and maximum number of replicas that the HPA can scale to.
- We set `pulumi.export` to output the name of the HPA, which can be useful for reference when you're querying or managing your Kubernetes resources.

Lastly, make sure you have the Prometheus metrics properly set up in your cluster and that the custom metric you define is available to be consumed by the Kubernetes metrics server. The Prometheus Adapter for Kubernetes Metrics APIs can be used to expose custom metrics to the HPA controller.