Auto-Scaling Solr with Kubernetes for Real-Time AI Analytics

Question

Pulumi · Accepted Answer

To set up an auto-scaling Apache Solr service on Kubernetes that's suitable for real-time AI analytics, there are several aspects that need to be addressed:

1. **Solr on Kubernetes**: Solr can be run on Kubernetes as a StatefulSet or a Deployment, depending on whether you require stable, unique network identifiers for each Solr pod. Solr can be clustered for scalability and resilience.

2. **Auto-Scaling**: Kubernetes provides two types of auto-scalers - the Horizontal Pod Autoscaler (HPA) and cluster node-level auto-scaling. The HPA automatically scales the number of Pod replicas based on observed CPU utilization or other select metrics. Cluster auto-scaling adjusts the number of nodes in the cluster, which can be useful if your pods need to scale beyond the capacity of the current cluster size.

3. **Real-Time AI Analytics**: Solr can be leveraged for AI analytics in real-time by utilizing its capabilities of indexing and querying large amounts of data with low latency. The introduction of scaling in this scenario is significant, as the load can dramatically increase due to real-time data feeds, requiring the Solr service to scale accordingly to maintain performance.

The program below sets up a basic example using Pulumi to create:
- A `Deployment` for Solr.
- A `Service` that exposes Solr to the cluster (and potentially external clients).
- A `HorizontalPodAutoscaler` to automatically scale the Solr deployment based on CPU utilization.

Before running the Pulumi program, ensure you have:
- Installed Pulumi and set up the Pulumi CLI.
- Configured your Kubernetes cluster and ensured kubectl is set up to communicate with your cluster.
- Defined any necessary Pulumi configuration for the Kubernetes provider, such as context and namespace, if you're working in a specific namespace other than the default.

Here's the program:

```python
import pulumi
import pulumi_kubernetes as k8s

# Define the Solr Deployment
solr_deployment = k8s.apps.v1.Deployment(
    "solr-deployment",
    spec=k8s.apps.v1.DeploymentSpecArgs(
        replicas=3,  # Start with 3 replicas
        selector=k8s.meta.v1.LabelSelectorArgs(
            match_labels={"app": "solr"},
        ),
        template=k8s.core.v1.PodTemplateSpecArgs(
            metadata=k8s.meta.v1.ObjectMetaArgs(
                labels={"app": "solr"},
            ),
            spec=k8s.core.v1.PodSpecArgs(
                containers=[k8s.core.v1.ContainerArgs(
                    name="solr",
                    image="solr:8",  # Use the official Solr image from Docker Hub
                    ports=[k8s.core.v1.ContainerPortArgs(
                        container_port=8983,  # Default Solr port
                    )],
                    # Define resources for autoscaling to monitor, adapt as needed
                    resources=k8s.core.v1.ResourceRequirementsArgs(
                        requests={"cpu": "500m", "memory": "1Gi"},
                        limits={"cpu": "2", "memory": "4Gi"},
                    ),
                )],
            ),
        ),
    ))

# Define the Service to expose Solr
solr_service = k8s.core.v1.Service(
    "solr-service",
    spec=k8s.core.v1.ServiceSpecArgs(
        type="LoadBalancer",  # For external access, consider relevant type based on your cloud provider
        ports=[k8s.core.v1.ServicePortArgs(
            port=8983,
            target_port=pulumi.IntOrString(8983),
        )],
        selector={"app": "solr"},
    ))

# Define a Horizontal Pod Autoscaler to scale Solr based on CPU usage.
solr_hpa = k8s.autoscaling.v1.HorizontalPodAutoscaler(
    "solr-hpa",
    spec=k8s.autoscaling.v1.HorizontalPodAutoscalerSpecArgs(
        scale_target_ref=k8s.autoscaling.v1.CrossVersionObjectReferenceArgs(
            api_version="apps/v1",
            kind="Deployment",
            name=solr_deployment.metadata.name,
        ),
        min_replicas=3,  # Minimum number of replicas
        max_replicas=10,  # Maximum number of replicas
        target_cpu_utilization_percentage=50,  # Target CPU utilization percentage to trigger a scale up
    ))

# Export the service endpoint to access Solr
pulumi.export('solr_endpoint', pulumi.Output.concat("http://", solr_service.status.apply(lambda status: status.load_balancer.ingress[0].ip)))
```

In the program above, we create a `Deployment` to manage the Solr pods with an initial replica count of 3. Each pod has set resource `requests` and `limits` which are important for the `HorizontalPodAutoscaler` to make decisions about scaling.

The `Service` exposes the Solr deployment within the Kubernetes cluster and potentially outside depending on the `type` (e.g., `LoadBalancer` for cloud providers).

The `HorizontalPodAutoscaler` automatically adjusts the number of Solr pods in the deployment based on the CPU utilization. It's configured to keep the CPU utilization around 50%. If it goes above, the HPA adds more Solr pods to the deployment until it reaches the maximum of 10 replicas. If the utilization falls below the target for an extended period, it will remove pods until it reaches the minimum count of 3.

Finally, we export the Solr service endpoint so that you can access the Solr UI and API from outside the Kubernetes cluster.

To deploy the Solr service using this Pulumi program, save the code to a file (for instance `__main__.py` in a new Pulumi project), run `pulumi up`, and confirm the deployment when prompted.