Scheduled Scaling for Cost-Effective GPU Utilization

Question

Pulumi · Accepted Answer

Scheduled scaling is a cloud feature that allows you to automatically adjust the number of compute instances in your scaling group according to a schedule that you define. This is particularly useful when dealing with workloads that have predictable performance patterns, like GPU-based workloads used for machine learning or data processing tasks.

You might want to scale up your resources when the demand is high, and scale down when the demand decreases to save on costs. For this purpose, we could use a `Autoscaler` resource with predefined schedules to manage the desired instance count over time.

In the program below, I will demonstrate how to set up a scheduled scaling policy for a GPU-based instance group in Google Cloud Platform (GCP) using Pulumi's GCP provider. We'll create an instance template for GPU-based instances, a managed instance group to hold our instances, and an autoscaler with a basic schedule to scale the number of instances up and down at defined times.

```python
import pulumi
import pulumi_gcp as gcp

# First, define the instance template for GPU-based instances.
gpu_instance_template = gcp.compute.InstanceTemplate("gpuInstanceTemplate",
    description="Instance template for GPU workloads",
    machine_type="n1-standard-8",  # You can select the appropriate machine type for your workload.
    disks=[{
        "boot": True,
        "autoDelete": True,
        "initializeParams": {
            "image": "projects/debian-cloud/global/images/family/debian-10"  # Example image.
        },
    }],
    guest_accelerators=[{
        "type": "nvidia-tesla-k80",
        "count": 1,
    }],
)

# Then create a managed instance group based on this template.
gpu_instance_group = gcp.compute.InstanceGroupManager("gpuInstanceGroup",
    base_instance_name="gpu-instance",
    instance_template=gpu_instance_template.self_link,
    target_size=1,  # Start with 1 instance.
    zone="us-west1-a",  # Replace with your preferred zone.
)

# Finally, define an autoscaler with scheduling.
gpu_autoscaler = gcp.compute.Autoscaler("gpuAutoscaler",
    target=gpu_instance_group.self_link,
    autoscaling_policy={
        "maxReplicas": 10,  # Maximum number of instances.
        "minReplicas": 1,   # Minimum number of instances.
        "cooldownPeriod": 60,  # The number of seconds to wait before collecting information from a new instance.
        "cpuUtilization": {
            "target": 0.6  # Target CPU utilization to trigger scaling.
        },
        # Scaling based on a predefined schedule.
        "scalingSchedules": [{
            "name": "nightly-scale-down",
            "schedule": "0 20 * * *",  # Every day at 20:00 (8 PM).
            "timeZone": "UTC",
            "description": "Scale down in the evening",
            "durationSec": 3600,  # 1 hour duration for the scaling action.
            "minRequiredReplicas": 1,  # Scale down to minimum required replicas.
        }, {
            "name": "morning-scale-up",
            "schedule": "30 8 * * *",  # Every day at 8:30 AM.
            "timeZone": "UTC",
            "description": "Scale up in the morning",
            "durationSec": 3600,  # 1 hour duration for the scaling action.
            "minRequiredReplicas": 5,  # Scale up to 5 replicas.
        }],
    },
    zone="us-west1-a",  # Replace with your preferred zone.
)

# Export the instance group URL as an output
pulumi.export("instance_group_url", gpu_instance_group.instance_group)
```

This program sets up a scheduled scaling policy that will:

- Scale down to 1 instance every day at 8 PM UTC (presumably during off-peak hours).
- Scale up to 5 instances every day at 8:30 AM UTC to handle the expected increase in demand.

The parameters within the `gpu_instance_template` define GPU machine types and image used for the GPU instance which you can adjust according to your needs.

The `InstanceGroupManager` resource manages a group of homogeneous instances that are based on the instance template. It ensures that a specified number of instances are running and automates the creation, deletion, and repair of instances.

The `Autoscaler` resource is associated with the instance group and employs a policy that dictates when to scale up or down. The `scalingSchedules` within the autoscaler policy are responsible for triggering these scale events at the defined times.

Replace the zone with the one that best suits your geographic requirements and machine type, and image with ones that are compatible with your GPU workloads.

After you deploy this program using the Pulumi CLI, it will provision the resources in GCP and your scheduled scaling will be active. Remember to check the time zone settings and cron schedules to match your requirements. Moreover, ensure that your GCP project has sufficient quotas for the resources requested by your Pulumi program.