Configure GCP Regional Instance Group Managers

The gcp:compute/regionInstanceGroupManager:RegionInstanceGroupManager resource, part of the Pulumi GCP provider, manages pools of homogeneous Compute Engine VMs across multiple zones in a region, using instance templates to define VM configuration. This guide focuses on three capabilities: health monitoring and auto-healing, canary deployments with multiple versions, and standby instances for cost-optimized scaling.

Regional instance group managers require instance templates and may reference health checks, target pools, or load balancing infrastructure. The examples are intentionally small. Combine them with your own templates, networking, and load balancing configuration.

Deploy a managed instance group with health checks

Most deployments distribute VMs across multiple zones for high availability and configure health checks to automatically replace failed instances.

import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";

const autohealing = new gcp.compute.HealthCheck("autohealing", {
    name: "autohealing-health-check",
    checkIntervalSec: 5,
    timeoutSec: 5,
    healthyThreshold: 2,
    unhealthyThreshold: 10,
    httpHealthCheck: {
        requestPath: "/healthz",
        port: 8080,
    },
});
const appserver = new gcp.compute.RegionInstanceGroupManager("appserver", {
    name: "appserver-igm",
    baseInstanceName: "app",
    region: "us-central1",
    distributionPolicyZones: [
        "us-central1-a",
        "us-central1-f",
    ],
    versions: [{
        instanceTemplate: appserverGoogleComputeInstanceTemplate.selfLinkUnique,
    }],
    allInstancesConfig: {
        metadata: {
            metadata_key: "metadata_value",
        },
        labels: {
            label_key: "label_value",
        },
    },
    targetPools: [appserverGoogleComputeTargetPool.id],
    targetSize: 2,
    namedPorts: [{
        name: "custom",
        port: 8888,
    }],
    autoHealingPolicies: {
        healthCheck: autohealing.id,
        initialDelaySec: 300,
    },
});
import pulumi
import pulumi_gcp as gcp

autohealing = gcp.compute.HealthCheck("autohealing",
    name="autohealing-health-check",
    check_interval_sec=5,
    timeout_sec=5,
    healthy_threshold=2,
    unhealthy_threshold=10,
    http_health_check={
        "request_path": "/healthz",
        "port": 8080,
    })
appserver = gcp.compute.RegionInstanceGroupManager("appserver",
    name="appserver-igm",
    base_instance_name="app",
    region="us-central1",
    distribution_policy_zones=[
        "us-central1-a",
        "us-central1-f",
    ],
    versions=[{
        "instance_template": appserver_google_compute_instance_template["selfLinkUnique"],
    }],
    all_instances_config={
        "metadata": {
            "metadata_key": "metadata_value",
        },
        "labels": {
            "label_key": "label_value",
        },
    },
    target_pools=[appserver_google_compute_target_pool["id"]],
    target_size=2,
    named_ports=[{
        "name": "custom",
        "port": 8888,
    }],
    auto_healing_policies={
        "health_check": autohealing.id,
        "initial_delay_sec": 300,
    })
package main

import (
	"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/compute"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
		autohealing, err := compute.NewHealthCheck(ctx, "autohealing", &compute.HealthCheckArgs{
			Name:               pulumi.String("autohealing-health-check"),
			CheckIntervalSec:   pulumi.Int(5),
			TimeoutSec:         pulumi.Int(5),
			HealthyThreshold:   pulumi.Int(2),
			UnhealthyThreshold: pulumi.Int(10),
			HttpHealthCheck: &compute.HealthCheckHttpHealthCheckArgs{
				RequestPath: pulumi.String("/healthz"),
				Port:        pulumi.Int(8080),
			},
		})
		if err != nil {
			return err
		}
		_, err = compute.NewRegionInstanceGroupManager(ctx, "appserver", &compute.RegionInstanceGroupManagerArgs{
			Name:             pulumi.String("appserver-igm"),
			BaseInstanceName: pulumi.String("app"),
			Region:           pulumi.String("us-central1"),
			DistributionPolicyZones: pulumi.StringArray{
				pulumi.String("us-central1-a"),
				pulumi.String("us-central1-f"),
			},
			Versions: compute.RegionInstanceGroupManagerVersionArray{
				&compute.RegionInstanceGroupManagerVersionArgs{
					InstanceTemplate: pulumi.Any(appserverGoogleComputeInstanceTemplate.SelfLinkUnique),
				},
			},
			AllInstancesConfig: &compute.RegionInstanceGroupManagerAllInstancesConfigArgs{
				Metadata: pulumi.StringMap{
					"metadata_key": pulumi.String("metadata_value"),
				},
				Labels: pulumi.StringMap{
					"label_key": pulumi.String("label_value"),
				},
			},
			TargetPools: pulumi.StringArray{
				appserverGoogleComputeTargetPool.Id,
			},
			TargetSize: pulumi.Int(2),
			NamedPorts: compute.RegionInstanceGroupManagerNamedPortArray{
				&compute.RegionInstanceGroupManagerNamedPortArgs{
					Name: pulumi.String("custom"),
					Port: pulumi.Int(8888),
				},
			},
			AutoHealingPolicies: &compute.RegionInstanceGroupManagerAutoHealingPoliciesArgs{
				HealthCheck:     autohealing.ID(),
				InitialDelaySec: pulumi.Int(300),
			},
		})
		if err != nil {
			return err
		}
		return nil
	})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Gcp = Pulumi.Gcp;

return await Deployment.RunAsync(() => 
{
    var autohealing = new Gcp.Compute.HealthCheck("autohealing", new()
    {
        Name = "autohealing-health-check",
        CheckIntervalSec = 5,
        TimeoutSec = 5,
        HealthyThreshold = 2,
        UnhealthyThreshold = 10,
        HttpHealthCheck = new Gcp.Compute.Inputs.HealthCheckHttpHealthCheckArgs
        {
            RequestPath = "/healthz",
            Port = 8080,
        },
    });

    var appserver = new Gcp.Compute.RegionInstanceGroupManager("appserver", new()
    {
        Name = "appserver-igm",
        BaseInstanceName = "app",
        Region = "us-central1",
        DistributionPolicyZones = new[]
        {
            "us-central1-a",
            "us-central1-f",
        },
        Versions = new[]
        {
            new Gcp.Compute.Inputs.RegionInstanceGroupManagerVersionArgs
            {
                InstanceTemplate = appserverGoogleComputeInstanceTemplate.SelfLinkUnique,
            },
        },
        AllInstancesConfig = new Gcp.Compute.Inputs.RegionInstanceGroupManagerAllInstancesConfigArgs
        {
            Metadata = 
            {
                { "metadata_key", "metadata_value" },
            },
            Labels = 
            {
                { "label_key", "label_value" },
            },
        },
        TargetPools = new[]
        {
            appserverGoogleComputeTargetPool.Id,
        },
        TargetSize = 2,
        NamedPorts = new[]
        {
            new Gcp.Compute.Inputs.RegionInstanceGroupManagerNamedPortArgs
            {
                Name = "custom",
                Port = 8888,
            },
        },
        AutoHealingPolicies = new Gcp.Compute.Inputs.RegionInstanceGroupManagerAutoHealingPoliciesArgs
        {
            HealthCheck = autohealing.Id,
            InitialDelaySec = 300,
        },
    });

});
package generated_program;

import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.gcp.compute.HealthCheck;
import com.pulumi.gcp.compute.HealthCheckArgs;
import com.pulumi.gcp.compute.inputs.HealthCheckHttpHealthCheckArgs;
import com.pulumi.gcp.compute.RegionInstanceGroupManager;
import com.pulumi.gcp.compute.RegionInstanceGroupManagerArgs;
import com.pulumi.gcp.compute.inputs.RegionInstanceGroupManagerVersionArgs;
import com.pulumi.gcp.compute.inputs.RegionInstanceGroupManagerAllInstancesConfigArgs;
import com.pulumi.gcp.compute.inputs.RegionInstanceGroupManagerNamedPortArgs;
import com.pulumi.gcp.compute.inputs.RegionInstanceGroupManagerAutoHealingPoliciesArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;

public class App {
    public static void main(String[] args) {
        Pulumi.run(App::stack);
    }

    public static void stack(Context ctx) {
        var autohealing = new HealthCheck("autohealing", HealthCheckArgs.builder()
            .name("autohealing-health-check")
            .checkIntervalSec(5)
            .timeoutSec(5)
            .healthyThreshold(2)
            .unhealthyThreshold(10)
            .httpHealthCheck(HealthCheckHttpHealthCheckArgs.builder()
                .requestPath("/healthz")
                .port(8080)
                .build())
            .build());

        var appserver = new RegionInstanceGroupManager("appserver", RegionInstanceGroupManagerArgs.builder()
            .name("appserver-igm")
            .baseInstanceName("app")
            .region("us-central1")
            .distributionPolicyZones(            
                "us-central1-a",
                "us-central1-f")
            .versions(RegionInstanceGroupManagerVersionArgs.builder()
                .instanceTemplate(appserverGoogleComputeInstanceTemplate.selfLinkUnique())
                .build())
            .allInstancesConfig(RegionInstanceGroupManagerAllInstancesConfigArgs.builder()
                .metadata(Map.of("metadata_key", "metadata_value"))
                .labels(Map.of("label_key", "label_value"))
                .build())
            .targetPools(appserverGoogleComputeTargetPool.id())
            .targetSize(2)
            .namedPorts(RegionInstanceGroupManagerNamedPortArgs.builder()
                .name("custom")
                .port(8888)
                .build())
            .autoHealingPolicies(RegionInstanceGroupManagerAutoHealingPoliciesArgs.builder()
                .healthCheck(autohealing.id())
                .initialDelaySec(300)
                .build())
            .build());

    }
}
resources:
  autohealing:
    type: gcp:compute:HealthCheck
    properties:
      name: autohealing-health-check
      checkIntervalSec: 5
      timeoutSec: 5
      healthyThreshold: 2
      unhealthyThreshold: 10 # 50 seconds
      httpHealthCheck:
        requestPath: /healthz
        port: '8080'
  appserver:
    type: gcp:compute:RegionInstanceGroupManager
    properties:
      name: appserver-igm
      baseInstanceName: app
      region: us-central1
      distributionPolicyZones:
        - us-central1-a
        - us-central1-f
      versions:
        - instanceTemplate: ${appserverGoogleComputeInstanceTemplate.selfLinkUnique}
      allInstancesConfig:
        metadata:
          metadata_key: metadata_value
        labels:
          label_key: label_value
      targetPools:
        - ${appserverGoogleComputeTargetPool.id}
      targetSize: 2
      namedPorts:
        - name: custom
          port: 8888
      autoHealingPolicies:
        healthCheck: ${autohealing.id}
        initialDelaySec: 300

The versions property specifies which instance template to use. The distributionPolicyZones property spreads instances across us-central1-a and us-central1-f for zone-level redundancy. The autoHealingPolicies property connects a health check that monitors /healthz every 5 seconds; after 10 consecutive failures, the manager replaces the unhealthy instance. The targetSize property sets the desired instance count, while namedPorts exposes port 8888 for service discovery.

Run canary deployments with multiple template versions

When rolling out new application versions, teams often run a small canary deployment alongside the stable version to validate changes.

import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";

const appserver = new gcp.compute.RegionInstanceGroupManager("appserver", {
    name: "appserver-igm",
    baseInstanceName: "app",
    region: "us-central1",
    targetSize: 5,
    versions: [
        {
            instanceTemplate: appserverGoogleComputeInstanceTemplate.selfLinkUnique,
        },
        {
            instanceTemplate: appserver_canary.selfLinkUnique,
            targetSize: {
                fixed: 1,
            },
        },
    ],
});
import pulumi
import pulumi_gcp as gcp

appserver = gcp.compute.RegionInstanceGroupManager("appserver",
    name="appserver-igm",
    base_instance_name="app",
    region="us-central1",
    target_size=5,
    versions=[
        {
            "instance_template": appserver_google_compute_instance_template["selfLinkUnique"],
        },
        {
            "instance_template": appserver_canary["selfLinkUnique"],
            "target_size": {
                "fixed": 1,
            },
        },
    ])
package main

import (
	"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/compute"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
		_, err := compute.NewRegionInstanceGroupManager(ctx, "appserver", &compute.RegionInstanceGroupManagerArgs{
			Name:             pulumi.String("appserver-igm"),
			BaseInstanceName: pulumi.String("app"),
			Region:           pulumi.String("us-central1"),
			TargetSize:       pulumi.Int(5),
			Versions: compute.RegionInstanceGroupManagerVersionArray{
				&compute.RegionInstanceGroupManagerVersionArgs{
					InstanceTemplate: pulumi.Any(appserverGoogleComputeInstanceTemplate.SelfLinkUnique),
				},
				&compute.RegionInstanceGroupManagerVersionArgs{
					InstanceTemplate: pulumi.Any(appserver_canary.SelfLinkUnique),
					TargetSize: &compute.RegionInstanceGroupManagerVersionTargetSizeArgs{
						Fixed: pulumi.Int(1),
					},
				},
			},
		})
		if err != nil {
			return err
		}
		return nil
	})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Gcp = Pulumi.Gcp;

return await Deployment.RunAsync(() => 
{
    var appserver = new Gcp.Compute.RegionInstanceGroupManager("appserver", new()
    {
        Name = "appserver-igm",
        BaseInstanceName = "app",
        Region = "us-central1",
        TargetSize = 5,
        Versions = new[]
        {
            new Gcp.Compute.Inputs.RegionInstanceGroupManagerVersionArgs
            {
                InstanceTemplate = appserverGoogleComputeInstanceTemplate.SelfLinkUnique,
            },
            new Gcp.Compute.Inputs.RegionInstanceGroupManagerVersionArgs
            {
                InstanceTemplate = appserver_canary.SelfLinkUnique,
                TargetSize = new Gcp.Compute.Inputs.RegionInstanceGroupManagerVersionTargetSizeArgs
                {
                    Fixed = 1,
                },
            },
        },
    });

});
package generated_program;

import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.gcp.compute.RegionInstanceGroupManager;
import com.pulumi.gcp.compute.RegionInstanceGroupManagerArgs;
import com.pulumi.gcp.compute.inputs.RegionInstanceGroupManagerVersionArgs;
import com.pulumi.gcp.compute.inputs.RegionInstanceGroupManagerVersionTargetSizeArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;

public class App {
    public static void main(String[] args) {
        Pulumi.run(App::stack);
    }

    public static void stack(Context ctx) {
        var appserver = new RegionInstanceGroupManager("appserver", RegionInstanceGroupManagerArgs.builder()
            .name("appserver-igm")
            .baseInstanceName("app")
            .region("us-central1")
            .targetSize(5)
            .versions(            
                RegionInstanceGroupManagerVersionArgs.builder()
                    .instanceTemplate(appserverGoogleComputeInstanceTemplate.selfLinkUnique())
                    .build(),
                RegionInstanceGroupManagerVersionArgs.builder()
                    .instanceTemplate(appserver_canary.selfLinkUnique())
                    .targetSize(RegionInstanceGroupManagerVersionTargetSizeArgs.builder()
                        .fixed(1)
                        .build())
                    .build())
            .build());

    }
}
resources:
  appserver:
    type: gcp:compute:RegionInstanceGroupManager
    properties:
      name: appserver-igm
      baseInstanceName: app
      region: us-central1
      targetSize: 5
      versions:
        - instanceTemplate: ${appserverGoogleComputeInstanceTemplate.selfLinkUnique}
        - instanceTemplate: ${["appserver-canary"].selfLinkUnique}
          targetSize:
            fixed: 1

The versions array defines multiple instance templates. The first version runs the stable template across most instances. The second version uses a canary template with targetSize.fixed set to 1, limiting the canary to a single instance. This lets you test new code with real traffic before expanding the rollout.

Maintain standby instances for rapid scaling

Applications with unpredictable traffic spikes can keep instances in stopped or suspended states, ready to start quickly when demand increases.

import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";

const igm_sr = new gcp.compute.RegionInstanceGroupManager("igm-sr", {
    name: "tf-sr-igm",
    baseInstanceName: "tf-sr-igm-instance",
    region: "us-central1",
    targetSize: 5,
    versions: [{
        instanceTemplate: sr_igm.selfLink,
        name: "primary",
    }],
    standbyPolicy: {
        initialDelaySec: 50,
        mode: "SCALE_OUT_POOL",
    },
    targetSuspendedSize: 1,
    targetStoppedSize: 1,
});
import pulumi
import pulumi_gcp as gcp

igm_sr = gcp.compute.RegionInstanceGroupManager("igm-sr",
    name="tf-sr-igm",
    base_instance_name="tf-sr-igm-instance",
    region="us-central1",
    target_size=5,
    versions=[{
        "instance_template": sr_igm["selfLink"],
        "name": "primary",
    }],
    standby_policy={
        "initial_delay_sec": 50,
        "mode": "SCALE_OUT_POOL",
    },
    target_suspended_size=1,
    target_stopped_size=1)
package main

import (
	"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/compute"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
		_, err := compute.NewRegionInstanceGroupManager(ctx, "igm-sr", &compute.RegionInstanceGroupManagerArgs{
			Name:             pulumi.String("tf-sr-igm"),
			BaseInstanceName: pulumi.String("tf-sr-igm-instance"),
			Region:           pulumi.String("us-central1"),
			TargetSize:       pulumi.Int(5),
			Versions: compute.RegionInstanceGroupManagerVersionArray{
				&compute.RegionInstanceGroupManagerVersionArgs{
					InstanceTemplate: pulumi.Any(sr_igm.SelfLink),
					Name:             pulumi.String("primary"),
				},
			},
			StandbyPolicy: &compute.RegionInstanceGroupManagerStandbyPolicyArgs{
				InitialDelaySec: pulumi.Int(50),
				Mode:            pulumi.String("SCALE_OUT_POOL"),
			},
			TargetSuspendedSize: pulumi.Int(1),
			TargetStoppedSize:   pulumi.Int(1),
		})
		if err != nil {
			return err
		}
		return nil
	})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Gcp = Pulumi.Gcp;

return await Deployment.RunAsync(() => 
{
    var igm_sr = new Gcp.Compute.RegionInstanceGroupManager("igm-sr", new()
    {
        Name = "tf-sr-igm",
        BaseInstanceName = "tf-sr-igm-instance",
        Region = "us-central1",
        TargetSize = 5,
        Versions = new[]
        {
            new Gcp.Compute.Inputs.RegionInstanceGroupManagerVersionArgs
            {
                InstanceTemplate = sr_igm.SelfLink,
                Name = "primary",
            },
        },
        StandbyPolicy = new Gcp.Compute.Inputs.RegionInstanceGroupManagerStandbyPolicyArgs
        {
            InitialDelaySec = 50,
            Mode = "SCALE_OUT_POOL",
        },
        TargetSuspendedSize = 1,
        TargetStoppedSize = 1,
    });

});
package generated_program;

import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.gcp.compute.RegionInstanceGroupManager;
import com.pulumi.gcp.compute.RegionInstanceGroupManagerArgs;
import com.pulumi.gcp.compute.inputs.RegionInstanceGroupManagerVersionArgs;
import com.pulumi.gcp.compute.inputs.RegionInstanceGroupManagerStandbyPolicyArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;

public class App {
    public static void main(String[] args) {
        Pulumi.run(App::stack);
    }

    public static void stack(Context ctx) {
        var igm_sr = new RegionInstanceGroupManager("igm-sr", RegionInstanceGroupManagerArgs.builder()
            .name("tf-sr-igm")
            .baseInstanceName("tf-sr-igm-instance")
            .region("us-central1")
            .targetSize(5)
            .versions(RegionInstanceGroupManagerVersionArgs.builder()
                .instanceTemplate(sr_igm.selfLink())
                .name("primary")
                .build())
            .standbyPolicy(RegionInstanceGroupManagerStandbyPolicyArgs.builder()
                .initialDelaySec(50)
                .mode("SCALE_OUT_POOL")
                .build())
            .targetSuspendedSize(1)
            .targetStoppedSize(1)
            .build());

    }
}
resources:
  igm-sr:
    type: gcp:compute:RegionInstanceGroupManager
    properties:
      name: tf-sr-igm
      baseInstanceName: tf-sr-igm-instance
      region: us-central1
      targetSize: 5
      versions:
        - instanceTemplate: ${["sr-igm"].selfLink}
          name: primary
      standbyPolicy:
        initialDelaySec: 50
        mode: SCALE_OUT_POOL
      targetSuspendedSize: 1
      targetStoppedSize: 1

The standbyPolicy property configures how the manager maintains pre-provisioned instances. Setting mode to SCALE_OUT_POOL tells the manager to keep instances ready for rapid activation. The targetSuspendedSize and targetStoppedSize properties specify how many instances to maintain in each low-cost state. The initialDelaySec property delays standby activation for 50 seconds after group creation, allowing time for initial setup.

Beyond these examples

These snippets focus on specific instance group manager features: health monitoring and auto-healing, canary deployments with version management, and standby instances for rapid scaling. They’re intentionally minimal rather than full VM deployment solutions.

The examples reference pre-existing infrastructure such as instance templates, target pools for load balancing, and health check resources. They focus on configuring the instance group manager rather than provisioning the underlying templates and networking.

To keep things focused, common instance group patterns are omitted, including:

  • Update policies for rolling updates (updatePolicy)
  • Stateful disk and IP preservation (statefulDisks, statefulExternalIps)
  • Instance flexibility for mixed machine types (instanceFlexibilityPolicy)
  • Named ports for service discovery

These omissions are intentional: the goal is to illustrate how each instance group feature is wired, not provide drop-in VM management modules. See the RegionInstanceGroupManager resource reference for all available configuration options.

Let's configure GCP Regional Instance Group Managers

Get started with Pulumi Cloud, then follow our quick setup guide to deploy this infrastructure.

Try Pulumi Cloud for FREE

Frequently Asked Questions

Configuration & Setup
When should I use a regional vs zonal instance group manager?
Use RegionInstanceGroupManager for multi-zone deployments across a region. Use gcp.compute.InstanceGroupManager for single-zone deployments.
What properties can't I change after creating the instance group manager?
The following properties are immutable: baseInstanceName, name, distributionPolicyZones, region, project, description, and params.
Should I set targetSize when using an autoscaler?
No. The targetSize should be explicitly set only when managing capacity manually. When attached to an autoscaler, never set targetSize as the autoscaler controls it.
How do I distribute instances across multiple zones?
Set distributionPolicyZones to an array of zone names (e.g., ["us-central1-a", "us-central1-f"]). This property is required and immutable.
Instance Management & Updates
Why can't I update stateful disks on my instance group?
Proactive cross zone instance redistribution must be disabled before updating statefulDisks. Configure this via the updatePolicy property first.
How do I apply allInstancesConfig changes to existing instances?
After setting allInstancesConfig, you must manually trigger an update on the group’s instances. The configuration doesn’t automatically apply to existing instances.
What's the difference between STABLE and UPDATED wait status?
STABLE waits only for instances to be stable. UPDATED waits for the version target to be reached, per-instance configs to be effective, and all instances to be stable.
Deployment Strategies
How do I set up canary deployments?
Use multiple entries in the versions array. Specify targetSize.fixed on the canary version to control how many instances run the new template while others run the stable version.
What are standby instances and how do I configure them?
Standby instances are stopped or suspended VMs that can be quickly activated. Configure using standbyPolicy with targetStoppedSize and targetSuspendedSize to specify how many instances to keep in each state.
Health & Monitoring
How do I configure auto-healing for my instance group?
Set autoHealingPolicies with a health check ID and initialDelaySec (e.g., 300 seconds) to specify how long to wait before starting health checks on new instances.
What happens if waitForInstances times out?
When waitForInstances is true and the operation doesn’t succeed, the provider continues retrying until it times out. Plan for potentially long wait times depending on your timeout configuration.

Using a different cloud?

Explore compute guides for other cloud providers: