Autoscaling Elasticsearch on Kubernetes for AI Workloads

Question

Pulumi · Accepted Answer

In order to set up autoscaling for Elasticsearch on Kubernetes to handle AI workloads, you will need to ensure that Elasticsearch is deployed as a StatefulSet or Deployment on Kubernetes, and then configure Horizontal Pod Autoscaling (HPA) based on the specific metrics that indicate when your AI workload requires more resources.

The steps you would typically follow to achieve this are:

1. **Deploy Elasticsearch**: Deploy Elasticsearch within your Kubernetes cluster. You would typically use a StatefulSet for Elasticsearch as it requires stable, persistent storage.

2. **Configure Metrics**: Ensure that the right metrics are being collected. This could involve CPU usage, memory usage, or custom metrics specific to Elasticsearch's performance under your AI workload.

3. **Create an HPA**: Define a Horizontal Pod Autoscaler (HPA) resource that will automatically scale the number of Elasticsearch pod replicas based on the observed metrics.

Below is a Pulumi program that demonstrates how you might set up an Elasticsearch StatefulSet with autoscaling using the Pulumi Kubernetes provider. The program assumes you already have a Kubernetes cluster set up.

### Explanation and Preparing the Program

Before diving into the program, make sure you have Pulumi installed and configured with access to your Kubernetes cluster. You can follow the [Pulumi Getting Started guide](https://www.pulumi.com/docs/get-started/kubernetes/) for Kubernetes if you haven't set this up yet.

This program will:
- Create an `Elasticsearch` StatefulSet with a specified number of replicas.
- Define a Kubernetes `HorizontalPodAutoscaler` to automatically scale the number of Elasticsearch pods based on CPU utilization or any other custom metrics you wish to monitor.

Let's create the above configuration using Pulumi.

```python
import pulumi
from pulumi_kubernetes.apps.v1 import StatefulSet, StatefulSetSpec, Deployment
from pulumi_kubernetes.autoscaling.v1 import HorizontalPodAutoscaler, HorizontalPodAutoscalerSpec, MetricSpec, ResourceMetricSource, CrossVersionObjectReference
from pulumi_kubernetes.core.v1 import ResourceRequirements, PersistentVolumeClaim, PersistentVolumeClaimSpec, Container, EnvVar
from pulumi_kubernetes.meta.v1 import LabelSelector, ObjectMeta
from pulumi_kubernetes import Provider

# This presumes you have already configured Kubernetes and Pulumi to point to your cluster.
# For instance, by using `kubectl` to apply a configuration context.

# Define the StatefulSet for Elasticsearch.
stateful_set = StatefulSet(
    "elastic-statefulset",
    spec=StatefulSetSpec(
        service_name="elasticsearch",
        replicas=3,  # Start with 3 replicas; HPA will scale this number based on the metric defined below.
        selector=LabelSelector(match_labels={"app": "elasticsearch"}),
        template={
            "metadata": {
                "labels": {
                    "app": "elasticsearch"
                }
            },
            "spec": {
                "containers": [
                    {
                        "name": "elasticsearch",
                        "image": "docker.elastic.co/elasticsearch/elasticsearch:7.10.1",
                        "resources": ResourceRequirements(
                            requests={
                                "cpu": "500m",
                                "memory": "2048Mi"
                            },
                        ),
                        "env": [
                            EnvVar(name="discovery.type", value="single-node")
                        ],
                        "volumeMounts": [
                            {
                                "name": "data",
                                "mountPath": "/usr/share/elasticsearch/data"
                            },
                        ],
                    },
                ],
                "volumeClaimTemplates": [
                    PersistentVolumeClaim(
                        "elasticsearch-data",
                        spec=PersistentVolumeClaimSpec(
                            access_modes=["ReadWriteOnce"],
                            resources={
                                "requests": {
                                    "storage": "10Gi"
                                }
                            },
                        ),
                    ),
                ],
            },
        },
    ),
)

# Define the HorizontalPodAutoscaler to automatically scale the StatefulSet.
hpa = HorizontalPodAutoscaler(
    "elastic-hpa",
    spec=HorizontalPodAutoscalerSpec(
        scale_target_ref=CrossVersionObjectReference(
            kind="StatefulSet",  # Points to the StatefulSet we created earlier.
            name=stateful_set.metadata["name"],
            api_version="apps/v1",
        ),
        min_replicas=2,  # Minimum number of replicas.
        max_replicas=5,  # Maximum number of replicas.
        metrics=[MetricSpec(
            type="Resource",
            resource=ResourceMetricSource(
                name="cpu",
                target_average_utilization=50,  # Target CPU utilization percentage to trigger scaling.
            ),
        )],
    ),
)

# Export the StatefulSet name and the HPA name
pulumi.export("stateful_set_name", stateful_set.metadata["name"])
pulumi.export("hpa_name", hpa.metadata["name"])
```

### Breakdown of the Pulumi Program

1. **StatefulSet for Elasticsearch**: The `StatefulSet` Kubernetes resource represents a group of pods with persistent storage and unique network identifiers. We define an Elasticsearch Docker image and declare the mount path for the Elasticsearch data.

2. **PersistentVolumeClaim**: The `volumeClaimTemplates` field inside the StatefulSet spec creates a `PersistentVolumeClaim` for each pod that matches the template. This is where the persistent data for Elasticsearch will be stored.

3. **HorizontalPodAutoscaler (HPA)**: The `HorizontalPodAutoscaler` resource scales the number of pods in the StatefulSet up or down based on the CPU utilization.

### Final Notes

This program is a starting point. Depending on your specific AI workload and Elasticsearch configuration, you may need to adjust the Docker image version, resource requests, persistent storage size, and environment variables.

Moreover, you might need to implement custom metrics instead of the CPU-based autoscaling, especially if your AI workload generates metrics that are a more direct indicator of scaling requirements.

When you're ready to deploy this Pulumi program, save it in a file (for example `elasticsearch_autoscaling.py`), and then execute `pulumi up`. After reviewing the proposed changes, confirm the deployment by selecting `yes` when prompted.