Auto-scaling ML Workloads with Red Hat OpenShift

Question

Pulumi · Accepted Answer

Scaling machine learning (ML) workloads is an important aspect when dealing with variable workloads where demand can unpredictably fluctuate. Pulumi allows you to manage and scale Kubernetes workloads easily, and when combined with Red Hat OpenShift, you get a comprehensive solution that provides robust support for containerized applications. With Red Hat OpenShift on Azure, we can utilize Kubernetes' native support for auto-scaling through resources such as Horizontal Pod Autoscalers (HPA).

To implement auto-scaling for ML workloads on Red Hat OpenShift, the key Pulumi resources will be:
1. **OpenShiftCluster** (`azure-native.redhatopenshift.OpenShiftCluster`): To create an Azure Red Hat OpenShift cluster where the ML workloads will run.
2. **HorizontalPodAutoscaler** (`kubernetes.autoscaling.v2beta2.HorizontalPodAutoscaler`): To set up auto-scaling rules based on CPU or memory usage.

The following program demonstrates how to create an Azure Red Hat OpenShift cluster and a horizontal pod autoscaler for an ML workload on Kubernetes. It assumes that you have already set up a Kubernetes Deployment for your ML application that you wish to autoscale.

```python
import pulumi
from pulumi_azure_native import redhatopenshift as openshift
from pulumi_kubernetes import autoscaling, core

# Create an Azure Red Hat OpenShift cluster (ARO)
openshift_cluster = openshift.OpenShiftCluster(
    "my-openshift-cluster",
    resource_group_name="my-resource-group",
    location="eastus",
    open_shift_cluster_properties=openshift.OpenShiftClusterPropertiesArgs(
        # Properties for the OpenShift cluster
        master_profile=openshift.MasterProfileArgs(
            vm_size="Standard_D8s_v3",
        ),
        worker_profiles=[
            openshift.WorkerProfileArgs(
                name="workerprofile",
                vm_size="Standard_D4s_v3",
                count=3,
            )
        ],
    )
)

# Kubernetes provider for the newly created OpenShift cluster
k8s_provider = pulumi.ProviderResource(
    "k8s-provider",
    kubeconfig=openshift_cluster.openshift_cluster_properties.kubeadmin_kubeconfig,
)

# Define a HorizontalPodAutoscaler to manage scaling of our ML workloads based on CPU utilization
hpa = autoscaling.v2beta2.HorizontalPodAutoscaler(
    "ml-workload-hpa",
    metadata=core.v1.ObjectMetaArgs(
        name="ml-workload-hpa",
        namespace="default",  # Replace with the namespace where your ML workloads are deployed
    ),
    spec=autoscaling.v2beta2.HorizontalPodAutoscalerSpecArgs(
        max_replicas=10,  # The maximum number of pods to scale out to
        min_replicas=1,   # The minimum number of pods to keep running
        scale_target_ref=autoscaling.v2beta2.CrossVersionObjectReferenceArgs(
            api_version="apps/v1",
            kind="Deployment",
            name="ml-application-deployment"  # Replace with the name of your ML deployment
        ),
        metrics=[
            autoscaling.v2beta2.MetricSpecArgs(
                type="Resource",
                resource=autoscaling.v2beta2.ResourceMetricSourceArgs(
                    name="cpu",
                    target=autoscaling.v2beta2.MetricTargetArgs(
                        type="Utilization",
                        average_utilization=50,  # Target CPU utilization percentage to trigger scaling
                    ),
                ),
            ),
        ],
    ),
    opts=pulumi.ResourceOptions(provider=k8s_provider),
)

pulumi.export('clusterName', openshift_cluster.name)
pulumi.export('kubeconfig', openshift_cluster.openshift_cluster_properties.kubeadmin_kubeconfig)
```

This program sets up an Azure Red Hat OpenShift cluster and configures an HPA for a deployment named `ml-application-deployment`. The HPA is designed to increase the number of replicas of the ML deployment when the average CPU utilization exceeds 50%. It also provides an upper bound of 10 replica pods for scaling out and ensures at least one pod is running at all times.

Please ensure that your Pulumi stack is correctly configured to access the Azure cloud (e.g., via `az login`), and you have the necessary permissions to create these resources in the specified resource group.

Once you deploy this program with Pulumi, the ML workloads running on the OpenShift cluster will automatically scale based on the defined criteria. This allows for an efficient use of resources by scaling up when demand is high and scaling down when demand is low, optimizing costs and performance.