1. Auto-Scaling ML Pipelines with Kubernetes HPA


    To set up auto-scaling for machine learning (ML) pipelines in a Kubernetes environment, we can leverage the Horizontal Pod Autoscaler (HPA). Kubernetes HPA automatically scales the number of pod replicas in a deployment or replicaset based on observed CPU utilization or other select metrics.

    The HPA adjusts the number of replicas in a deployment or replicaset to match the observed average CPU utilization to the target specified by the user.

    Below, I am going to illustrate a Pulumi program in Python that defines an ML pipeline deployment and a corresponding HPA resource. The program will:

    1. Create a Kubernetes Deployment that defines the desired state of the ML pipeline, including the container image to run.
    2. Define an HPA resource associated with the Deployment created in the first step. The HPA will target a certain average CPU utilization across all pods controlled by the deployment, scaling the number of replicas up or down based on this target.


    • You have a Kubernetes cluster up and running.
    • Pulumi CLI and the necessary cloud providers are already set up.
    • The Docker image for the ML pipeline is available (replaced with your-ml-pipeline-image:latest in the code).
    • You have configured kubectl to communicate with your Kubernetes cluster.

    Now, let's start with the Pulumi program:

    import pulumi import pulumi_kubernetes as k8s # Define the ML pipeline deployment. ml_pipeline_deployment = k8s.apps.v1.Deployment( "ml-pipeline-deployment", spec=k8s.apps.v1.DeploymentSpecArgs( replicas=2, # Starting number of replicas selector=k8s.meta.v1.LabelSelectorArgs( match_labels={"app": "ml-pipeline"}, ), template=k8s.core.v1.PodTemplateSpecArgs( metadata=k8s.meta.v1.ObjectMetaArgs( labels={"app": "ml-pipeline"}, ), spec=k8s.core.v1.PodSpecArgs( containers=[ k8s.core.v1.ContainerArgs( name="ml-pipeline-container", image="your-ml-pipeline-image:latest", # Replace with your actual image resources=k8s.core.v1.ResourceRequirementsArgs( requests={ "cpu": "500m", # Requested CPU resources }, limits={ "cpu": "1000m", # CPU resource limits }, ), ), ], ), ), ), ) # Define the Horizontal Pod Autoscaler. ml_pipeline_hpa = k8s.autoscaling.v1.HorizontalPodAutoscaler( "ml-pipeline-hpa", spec=k8s.autoscaling.v1.HorizontalPodAutoscalerSpecArgs( scale_target_ref=k8s.autoscaling.v1.CrossVersionObjectReferenceArgs( api_version="apps/v1", kind="Deployment", name=ml_pipeline_deployment.metadata.name, ), min_replicas=1, # Minimum number of replicas max_replicas=10, # Maximum number of replicas target_cpu_utilization_percentage=80, # Target CPU utilization percentage ), metadata=k8s.meta.v1.ObjectMetaArgs( name="ml-pipeline-hpa", labels={"app": "ml-pipeline"}, ), ) # Export the name of the deployment pulumi.export("ml_pipeline_deployment_name", ml_pipeline_deployment.metadata.name) # Export the name of the HPA pulumi.export("ml_pipeline_hpa_name", ml_pipeline_hpa.metadata.name)

    Here's what each part of this Pulumi program is doing:

    • The ml_pipeline_deployment resource creates a Kubernetes Deployment that specifies the desired state for the ML pipeline. We start with two replicas, and the pods have the app: ml-pipeline label so that they can be identified by the HPA.

    • The container in the deployment specifies resource requests and limits for CPU resources. Requests are guarantees for resources reserved for the container, while limits define the maximum resources a container can use.

    • The ml_pipeline_hpa resource defines a Horizontal Pod Autoscaler that targets the deployment we created above. The scale_target_ref links the HPA to the deployment using its API version, kind, and name.

    • The min_replicas and max_replicas fields specify the lower and upper bounds for the number of replicas the HPA can scale to.

    • The target_cpu_utilization_percentage is the average CPU utilization that the HPA tries to maintain across all the pods; in this case, it's set to 80%.

    Finally, we export the names of the deployment and the HPA so they can be easily retrieved from the Pulumi stack.

    To run this program, save it in a file (for example autoscaling_ml_pipeline.py), then execute it with Pulumi CLI tooling. Assuming you've named your Pulumi project and your k8s cluster is correctly configured, this should set up the autoscaling for your ML pipeline.