Deploy GCP Vertex AI Models with Model Garden

The gcp:vertex/aiEndpointWithModelGardenDeployment:AiEndpointWithModelGardenDeployment resource, part of the Pulumi GCP provider, creates a Vertex AI endpoint and deploys a Model Garden or Hugging Face model to it in a single operation. This guide focuses on three capabilities: deploying pre-trained models from catalogs, configuring dedicated GPU resources, and Private Service Connect networking.

Deployments require a GCP project with Vertex AI API enabled and may reference VPC networks for private connectivity. The examples are intentionally small. Combine them with your own monitoring, autoscaling, and access control configuration.

Deploy a Model Garden model to an endpoint

Teams deploying generative AI models often start with Google’s Model Garden catalog, which provides pre-trained models ready for inference without custom training.

import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";

const deploy = new gcp.vertex.AiEndpointWithModelGardenDeployment("deploy", {
    publisherModelName: "publishers/google/models/paligemma@paligemma-224-float32",
    location: "us-central1",
    modelConfig: {
        acceptEula: true,
    },
});
import pulumi
import pulumi_gcp as gcp

deploy = gcp.vertex.AiEndpointWithModelGardenDeployment("deploy",
    publisher_model_name="publishers/google/models/paligemma@paligemma-224-float32",
    location="us-central1",
    model_config={
        "accept_eula": True,
    })
package main

import (
	"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/vertex"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
		_, err := vertex.NewAiEndpointWithModelGardenDeployment(ctx, "deploy", &vertex.AiEndpointWithModelGardenDeploymentArgs{
			PublisherModelName: pulumi.String("publishers/google/models/paligemma@paligemma-224-float32"),
			Location:           pulumi.String("us-central1"),
			ModelConfig: &vertex.AiEndpointWithModelGardenDeploymentModelConfigArgs{
				AcceptEula: pulumi.Bool(true),
			},
		})
		if err != nil {
			return err
		}
		return nil
	})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Gcp = Pulumi.Gcp;

return await Deployment.RunAsync(() => 
{
    var deploy = new Gcp.Vertex.AiEndpointWithModelGardenDeployment("deploy", new()
    {
        PublisherModelName = "publishers/google/models/paligemma@paligemma-224-float32",
        Location = "us-central1",
        ModelConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs
        {
            AcceptEula = true,
        },
    });

});
package generated_program;

import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeployment;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeploymentArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;

public class App {
    public static void main(String[] args) {
        Pulumi.run(App::stack);
    }

    public static void stack(Context ctx) {
        var deploy = new AiEndpointWithModelGardenDeployment("deploy", AiEndpointWithModelGardenDeploymentArgs.builder()
            .publisherModelName("publishers/google/models/paligemma@paligemma-224-float32")
            .location("us-central1")
            .modelConfig(AiEndpointWithModelGardenDeploymentModelConfigArgs.builder()
                .acceptEula(true)
                .build())
            .build());

    }
}
resources:
  deploy:
    type: gcp:vertex:AiEndpointWithModelGardenDeployment
    properties:
      publisherModelName: publishers/google/models/paligemma@paligemma-224-float32
      location: us-central1
      modelConfig:
        acceptEula: true

The publisherModelName property identifies the model using the format publishers/{publisher}/models/{model}@{version}. The modelConfig block requires acceptEula set to true, acknowledging the model’s license terms. Vertex AI provisions the endpoint and deploys the model automatically.

Deploy a Hugging Face model from the catalog

Hugging Face hosts thousands of open-source models that can be deployed directly through Vertex AI without manual container setup.

import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";

const deploy = new gcp.vertex.AiEndpointWithModelGardenDeployment("deploy", {
    huggingFaceModelId: "Qwen/Qwen3-0.6B",
    location: "us-central1",
    modelConfig: {
        acceptEula: true,
    },
});
import pulumi
import pulumi_gcp as gcp

deploy = gcp.vertex.AiEndpointWithModelGardenDeployment("deploy",
    hugging_face_model_id="Qwen/Qwen3-0.6B",
    location="us-central1",
    model_config={
        "accept_eula": True,
    })
package main

import (
	"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/vertex"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
		_, err := vertex.NewAiEndpointWithModelGardenDeployment(ctx, "deploy", &vertex.AiEndpointWithModelGardenDeploymentArgs{
			HuggingFaceModelId: pulumi.String("Qwen/Qwen3-0.6B"),
			Location:           pulumi.String("us-central1"),
			ModelConfig: &vertex.AiEndpointWithModelGardenDeploymentModelConfigArgs{
				AcceptEula: pulumi.Bool(true),
			},
		})
		if err != nil {
			return err
		}
		return nil
	})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Gcp = Pulumi.Gcp;

return await Deployment.RunAsync(() => 
{
    var deploy = new Gcp.Vertex.AiEndpointWithModelGardenDeployment("deploy", new()
    {
        HuggingFaceModelId = "Qwen/Qwen3-0.6B",
        Location = "us-central1",
        ModelConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs
        {
            AcceptEula = true,
        },
    });

});
package generated_program;

import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeployment;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeploymentArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;

public class App {
    public static void main(String[] args) {
        Pulumi.run(App::stack);
    }

    public static void stack(Context ctx) {
        var deploy = new AiEndpointWithModelGardenDeployment("deploy", AiEndpointWithModelGardenDeploymentArgs.builder()
            .huggingFaceModelId("Qwen/Qwen3-0.6B")
            .location("us-central1")
            .modelConfig(AiEndpointWithModelGardenDeploymentModelConfigArgs.builder()
                .acceptEula(true)
                .build())
            .build());

    }
}
resources:
  deploy:
    type: gcp:vertex:AiEndpointWithModelGardenDeployment
    properties:
      huggingFaceModelId: Qwen/Qwen3-0.6B
      location: us-central1
      modelConfig:
        acceptEula: true

Instead of publisherModelName, use huggingFaceModelId with the format author/model-name. Vertex AI handles container packaging and deployment. The acceptEula requirement applies to Hugging Face models as well.

Configure dedicated compute resources for deployment

Production deployments require control over machine types, GPU accelerators, and replica counts to meet latency and throughput requirements.

import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";

const deploy = new gcp.vertex.AiEndpointWithModelGardenDeployment("deploy", {
    publisherModelName: "publishers/google/models/paligemma@paligemma-224-float32",
    location: "us-central1",
    modelConfig: {
        acceptEula: true,
    },
    deployConfig: {
        dedicatedResources: {
            machineSpec: {
                machineType: "g2-standard-16",
                acceleratorType: "NVIDIA_L4",
                acceleratorCount: 1,
            },
            minReplicaCount: 1,
        },
    },
});
import pulumi
import pulumi_gcp as gcp

deploy = gcp.vertex.AiEndpointWithModelGardenDeployment("deploy",
    publisher_model_name="publishers/google/models/paligemma@paligemma-224-float32",
    location="us-central1",
    model_config={
        "accept_eula": True,
    },
    deploy_config={
        "dedicated_resources": {
            "machine_spec": {
                "machine_type": "g2-standard-16",
                "accelerator_type": "NVIDIA_L4",
                "accelerator_count": 1,
            },
            "min_replica_count": 1,
        },
    })
package main

import (
	"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/vertex"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
		_, err := vertex.NewAiEndpointWithModelGardenDeployment(ctx, "deploy", &vertex.AiEndpointWithModelGardenDeploymentArgs{
			PublisherModelName: pulumi.String("publishers/google/models/paligemma@paligemma-224-float32"),
			Location:           pulumi.String("us-central1"),
			ModelConfig: &vertex.AiEndpointWithModelGardenDeploymentModelConfigArgs{
				AcceptEula: pulumi.Bool(true),
			},
			DeployConfig: &vertex.AiEndpointWithModelGardenDeploymentDeployConfigArgs{
				DedicatedResources: &vertex.AiEndpointWithModelGardenDeploymentDeployConfigDedicatedResourcesArgs{
					MachineSpec: &vertex.AiEndpointWithModelGardenDeploymentDeployConfigDedicatedResourcesMachineSpecArgs{
						MachineType:      pulumi.String("g2-standard-16"),
						AcceleratorType:  pulumi.String("NVIDIA_L4"),
						AcceleratorCount: pulumi.Int(1),
					},
					MinReplicaCount: pulumi.Int(1),
				},
			},
		})
		if err != nil {
			return err
		}
		return nil
	})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Gcp = Pulumi.Gcp;

return await Deployment.RunAsync(() => 
{
    var deploy = new Gcp.Vertex.AiEndpointWithModelGardenDeployment("deploy", new()
    {
        PublisherModelName = "publishers/google/models/paligemma@paligemma-224-float32",
        Location = "us-central1",
        ModelConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs
        {
            AcceptEula = true,
        },
        DeployConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentDeployConfigArgs
        {
            DedicatedResources = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentDeployConfigDedicatedResourcesArgs
            {
                MachineSpec = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentDeployConfigDedicatedResourcesMachineSpecArgs
                {
                    MachineType = "g2-standard-16",
                    AcceleratorType = "NVIDIA_L4",
                    AcceleratorCount = 1,
                },
                MinReplicaCount = 1,
            },
        },
    });

});
package generated_program;

import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeployment;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeploymentArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentDeployConfigArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentDeployConfigDedicatedResourcesArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentDeployConfigDedicatedResourcesMachineSpecArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;

public class App {
    public static void main(String[] args) {
        Pulumi.run(App::stack);
    }

    public static void stack(Context ctx) {
        var deploy = new AiEndpointWithModelGardenDeployment("deploy", AiEndpointWithModelGardenDeploymentArgs.builder()
            .publisherModelName("publishers/google/models/paligemma@paligemma-224-float32")
            .location("us-central1")
            .modelConfig(AiEndpointWithModelGardenDeploymentModelConfigArgs.builder()
                .acceptEula(true)
                .build())
            .deployConfig(AiEndpointWithModelGardenDeploymentDeployConfigArgs.builder()
                .dedicatedResources(AiEndpointWithModelGardenDeploymentDeployConfigDedicatedResourcesArgs.builder()
                    .machineSpec(AiEndpointWithModelGardenDeploymentDeployConfigDedicatedResourcesMachineSpecArgs.builder()
                        .machineType("g2-standard-16")
                        .acceleratorType("NVIDIA_L4")
                        .acceleratorCount(1)
                        .build())
                    .minReplicaCount(1)
                    .build())
                .build())
            .build());

    }
}
resources:
  deploy:
    type: gcp:vertex:AiEndpointWithModelGardenDeployment
    properties:
      publisherModelName: publishers/google/models/paligemma@paligemma-224-float32
      location: us-central1
      modelConfig:
        acceptEula: true
      deployConfig:
        dedicatedResources:
          machineSpec:
            machineType: g2-standard-16
            acceleratorType: NVIDIA_L4
            acceleratorCount: 1
          minReplicaCount: 1

The deployConfig block specifies infrastructure. Inside dedicatedResources, machineSpec defines the machine type and GPU configuration: machineType sets the instance size, acceleratorType selects the GPU model (like NVIDIA_L4), and acceleratorCount determines how many GPUs attach to each replica. The minReplicaCount ensures at least one instance runs continuously.

Enable Private Service Connect for network isolation

Organizations with strict network policies use Private Service Connect to keep model inference traffic within their VPC without exposing public endpoints.

import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";

const deploy = new gcp.vertex.AiEndpointWithModelGardenDeployment("deploy", {
    publisherModelName: "publishers/google/models/paligemma@paligemma-224-float32",
    location: "us-central1",
    modelConfig: {
        acceptEula: true,
    },
    endpointConfig: {
        privateServiceConnectConfig: {
            enablePrivateServiceConnect: true,
            projectAllowlists: ["my-project-id"],
        },
    },
});
import pulumi
import pulumi_gcp as gcp

deploy = gcp.vertex.AiEndpointWithModelGardenDeployment("deploy",
    publisher_model_name="publishers/google/models/paligemma@paligemma-224-float32",
    location="us-central1",
    model_config={
        "accept_eula": True,
    },
    endpoint_config={
        "private_service_connect_config": {
            "enable_private_service_connect": True,
            "project_allowlists": ["my-project-id"],
        },
    })
package main

import (
	"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/vertex"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
		_, err := vertex.NewAiEndpointWithModelGardenDeployment(ctx, "deploy", &vertex.AiEndpointWithModelGardenDeploymentArgs{
			PublisherModelName: pulumi.String("publishers/google/models/paligemma@paligemma-224-float32"),
			Location:           pulumi.String("us-central1"),
			ModelConfig: &vertex.AiEndpointWithModelGardenDeploymentModelConfigArgs{
				AcceptEula: pulumi.Bool(true),
			},
			EndpointConfig: &vertex.AiEndpointWithModelGardenDeploymentEndpointConfigArgs{
				PrivateServiceConnectConfig: &vertex.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigArgs{
					EnablePrivateServiceConnect: pulumi.Bool(true),
					ProjectAllowlists: pulumi.StringArray{
						pulumi.String("my-project-id"),
					},
				},
			},
		})
		if err != nil {
			return err
		}
		return nil
	})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Gcp = Pulumi.Gcp;

return await Deployment.RunAsync(() => 
{
    var deploy = new Gcp.Vertex.AiEndpointWithModelGardenDeployment("deploy", new()
    {
        PublisherModelName = "publishers/google/models/paligemma@paligemma-224-float32",
        Location = "us-central1",
        ModelConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs
        {
            AcceptEula = true,
        },
        EndpointConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentEndpointConfigArgs
        {
            PrivateServiceConnectConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigArgs
            {
                EnablePrivateServiceConnect = true,
                ProjectAllowlists = new[]
                {
                    "my-project-id",
                },
            },
        },
    });

});
package generated_program;

import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeployment;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeploymentArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentEndpointConfigArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;

public class App {
    public static void main(String[] args) {
        Pulumi.run(App::stack);
    }

    public static void stack(Context ctx) {
        var deploy = new AiEndpointWithModelGardenDeployment("deploy", AiEndpointWithModelGardenDeploymentArgs.builder()
            .publisherModelName("publishers/google/models/paligemma@paligemma-224-float32")
            .location("us-central1")
            .modelConfig(AiEndpointWithModelGardenDeploymentModelConfigArgs.builder()
                .acceptEula(true)
                .build())
            .endpointConfig(AiEndpointWithModelGardenDeploymentEndpointConfigArgs.builder()
                .privateServiceConnectConfig(AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigArgs.builder()
                    .enablePrivateServiceConnect(true)
                    .projectAllowlists("my-project-id")
                    .build())
                .build())
            .build());

    }
}
resources:
  deploy:
    type: gcp:vertex:AiEndpointWithModelGardenDeployment
    properties:
      publisherModelName: publishers/google/models/paligemma@paligemma-224-float32
      location: us-central1
      modelConfig:
        acceptEula: true
      endpointConfig:
        privateServiceConnectConfig:
          enablePrivateServiceConnect: true
          projectAllowlists:
            - my-project-id

The endpointConfig block contains privateServiceConnectConfig. Set enablePrivateServiceConnect to true and list allowed project IDs in projectAllowlists. Only traffic from those projects can reach the endpoint through private network connections.

Automate Private Service Connect network setup

When deploying multiple models with Private Service Connect, automating the network attachment configuration reduces manual VPC setup steps.

import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";

const network = new gcp.compute.Network("network", {
    name: "network",
    autoCreateSubnetworks: false,
});
const project = gcp.organizations.getProject({});
const deploy = new gcp.vertex.AiEndpointWithModelGardenDeployment("deploy", {
    publisherModelName: "publishers/google/models/paligemma@paligemma-224-float32",
    location: "us-central1",
    modelConfig: {
        acceptEula: true,
    },
    endpointConfig: {
        privateServiceConnectConfig: {
            enablePrivateServiceConnect: true,
            projectAllowlists: [project.then(project => project.id)],
            pscAutomationConfigs: {
                projectId: project.then(project => project.id),
                network: network.id,
            },
        },
    },
});
const subnetwork = new gcp.compute.Subnetwork("subnetwork", {
    name: "subnetwork",
    ipCidrRange: "192.168.0.0/24",
    region: "us-central1",
    network: network.id,
});
import pulumi
import pulumi_gcp as gcp

network = gcp.compute.Network("network",
    name="network",
    auto_create_subnetworks=False)
project = gcp.organizations.get_project()
deploy = gcp.vertex.AiEndpointWithModelGardenDeployment("deploy",
    publisher_model_name="publishers/google/models/paligemma@paligemma-224-float32",
    location="us-central1",
    model_config={
        "accept_eula": True,
    },
    endpoint_config={
        "private_service_connect_config": {
            "enable_private_service_connect": True,
            "project_allowlists": [project.id],
            "psc_automation_configs": {
                "project_id": project.id,
                "network": network.id,
            },
        },
    })
subnetwork = gcp.compute.Subnetwork("subnetwork",
    name="subnetwork",
    ip_cidr_range="192.168.0.0/24",
    region="us-central1",
    network=network.id)
package main

import (
	"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/compute"
	"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/organizations"
	"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/vertex"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
		network, err := compute.NewNetwork(ctx, "network", &compute.NetworkArgs{
			Name:                  pulumi.String("network"),
			AutoCreateSubnetworks: pulumi.Bool(false),
		})
		if err != nil {
			return err
		}
		project, err := organizations.LookupProject(ctx, &organizations.LookupProjectArgs{}, nil)
		if err != nil {
			return err
		}
		_, err = vertex.NewAiEndpointWithModelGardenDeployment(ctx, "deploy", &vertex.AiEndpointWithModelGardenDeploymentArgs{
			PublisherModelName: pulumi.String("publishers/google/models/paligemma@paligemma-224-float32"),
			Location:           pulumi.String("us-central1"),
			ModelConfig: &vertex.AiEndpointWithModelGardenDeploymentModelConfigArgs{
				AcceptEula: pulumi.Bool(true),
			},
			EndpointConfig: &vertex.AiEndpointWithModelGardenDeploymentEndpointConfigArgs{
				PrivateServiceConnectConfig: &vertex.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigArgs{
					EnablePrivateServiceConnect: pulumi.Bool(true),
					ProjectAllowlists: pulumi.StringArray{
						pulumi.String(project.Id),
					},
					PscAutomationConfigs: &vertex.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigPscAutomationConfigsArgs{
						ProjectId: pulumi.String(project.Id),
						Network:   network.ID(),
					},
				},
			},
		})
		if err != nil {
			return err
		}
		_, err = compute.NewSubnetwork(ctx, "subnetwork", &compute.SubnetworkArgs{
			Name:        pulumi.String("subnetwork"),
			IpCidrRange: pulumi.String("192.168.0.0/24"),
			Region:      pulumi.String("us-central1"),
			Network:     network.ID(),
		})
		if err != nil {
			return err
		}
		return nil
	})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Gcp = Pulumi.Gcp;

return await Deployment.RunAsync(() => 
{
    var network = new Gcp.Compute.Network("network", new()
    {
        Name = "network",
        AutoCreateSubnetworks = false,
    });

    var project = Gcp.Organizations.GetProject.Invoke();

    var deploy = new Gcp.Vertex.AiEndpointWithModelGardenDeployment("deploy", new()
    {
        PublisherModelName = "publishers/google/models/paligemma@paligemma-224-float32",
        Location = "us-central1",
        ModelConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs
        {
            AcceptEula = true,
        },
        EndpointConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentEndpointConfigArgs
        {
            PrivateServiceConnectConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigArgs
            {
                EnablePrivateServiceConnect = true,
                ProjectAllowlists = new[]
                {
                    project.Apply(getProjectResult => getProjectResult.Id),
                },
                PscAutomationConfigs = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigPscAutomationConfigsArgs
                {
                    ProjectId = project.Apply(getProjectResult => getProjectResult.Id),
                    Network = network.Id,
                },
            },
        },
    });

    var subnetwork = new Gcp.Compute.Subnetwork("subnetwork", new()
    {
        Name = "subnetwork",
        IpCidrRange = "192.168.0.0/24",
        Region = "us-central1",
        Network = network.Id,
    });

});
package generated_program;

import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.gcp.compute.Network;
import com.pulumi.gcp.compute.NetworkArgs;
import com.pulumi.gcp.organizations.OrganizationsFunctions;
import com.pulumi.gcp.organizations.inputs.GetProjectArgs;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeployment;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeploymentArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentEndpointConfigArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigPscAutomationConfigsArgs;
import com.pulumi.gcp.compute.Subnetwork;
import com.pulumi.gcp.compute.SubnetworkArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;

public class App {
    public static void main(String[] args) {
        Pulumi.run(App::stack);
    }

    public static void stack(Context ctx) {
        var network = new Network("network", NetworkArgs.builder()
            .name("network")
            .autoCreateSubnetworks(false)
            .build());

        final var project = OrganizationsFunctions.getProject(GetProjectArgs.builder()
            .build());

        var deploy = new AiEndpointWithModelGardenDeployment("deploy", AiEndpointWithModelGardenDeploymentArgs.builder()
            .publisherModelName("publishers/google/models/paligemma@paligemma-224-float32")
            .location("us-central1")
            .modelConfig(AiEndpointWithModelGardenDeploymentModelConfigArgs.builder()
                .acceptEula(true)
                .build())
            .endpointConfig(AiEndpointWithModelGardenDeploymentEndpointConfigArgs.builder()
                .privateServiceConnectConfig(AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigArgs.builder()
                    .enablePrivateServiceConnect(true)
                    .projectAllowlists(project.id())
                    .pscAutomationConfigs(AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigPscAutomationConfigsArgs.builder()
                        .projectId(project.id())
                        .network(network.id())
                        .build())
                    .build())
                .build())
            .build());

        var subnetwork = new Subnetwork("subnetwork", SubnetworkArgs.builder()
            .name("subnetwork")
            .ipCidrRange("192.168.0.0/24")
            .region("us-central1")
            .network(network.id())
            .build());

    }
}
resources:
  deploy:
    type: gcp:vertex:AiEndpointWithModelGardenDeployment
    properties:
      publisherModelName: publishers/google/models/paligemma@paligemma-224-float32
      location: us-central1
      modelConfig:
        acceptEula: true
      endpointConfig:
        privateServiceConnectConfig:
          enablePrivateServiceConnect: true
          projectAllowlists:
            - ${project.id}
          pscAutomationConfigs:
            projectId: ${project.id}
            network: ${network.id}
  subnetwork:
    type: gcp:compute:Subnetwork
    properties:
      name: subnetwork
      ipCidrRange: 192.168.0.0/24
      region: us-central1
      network: ${network.id}
  network:
    type: gcp:compute:Network
    properties:
      name: network
      autoCreateSubnetworks: false
variables:
  project:
    fn::invoke:
      function: gcp:organizations:getProject
      arguments: {}

The pscAutomationConfigs block provisions the network attachment automatically. Specify projectId and reference a VPC network resource. Vertex AI creates the necessary service attachments without manual network configuration.

Beyond these examples

These snippets focus on specific endpoint deployment features: Model Garden and Hugging Face catalog deployment, dedicated compute resources and GPU configuration, and Private Service Connect networking. They’re intentionally minimal rather than full ML serving platforms.

The examples may reference pre-existing infrastructure such as GCP projects with Vertex AI API enabled, and VPC networks for PSC automation examples. They focus on configuring the deployment rather than provisioning surrounding infrastructure.

To keep things focused, common deployment patterns are omitted, including:

  • Parallel vs sequential deployment orchestration (examples 4 and 5 show dependsOn patterns)
  • Autoscaling configuration (maxReplicaCount)
  • Model monitoring and logging setup
  • Custom container images for non-catalog models

These omissions are intentional: the goal is to illustrate how each deployment feature is wired, not provide drop-in ML serving modules. See the Vertex AI Endpoint with Model Garden Deployment resource reference for all available configuration options.

Let's deploy GCP Vertex AI Models with Model Garden

Get started with Pulumi Cloud, then follow our quick setup guide to deploy this infrastructure.

Try Pulumi Cloud for FREE

Frequently Asked Questions

Model Selection & Configuration
Should I use publisherModelName or huggingFaceModelId?
Use publisherModelName for Model Garden models from Google or Meta publishers (format: publishers/{publisher}/models/{model}@{version}), and huggingFaceModelId for Hugging Face models (format: author/model-name). These are mutually exclusive options.
Do I need to accept the EULA for model deployment?
Yes, all examples include modelConfig.acceptEula: true, indicating EULA acceptance is required for deployment.
Deployment Configuration
How do I configure machine resources for my deployment?
Use deployConfig.dedicatedResources to specify machineType, acceleratorType, acceleratorCount, and minReplicaCount. For example, g2-standard-12 with NVIDIA_L4 accelerators.
What properties can't I change after deployment?
The following properties are immutable and require resource recreation: location, project, deployConfig, endpointConfig, huggingFaceModelId, modelConfig, and publisherModelName.
Multiple Model Deployments
How do I deploy multiple models at the same time?
Create multiple resources without dependsOn relationships. Each deployment will proceed independently in parallel.
How do I deploy multiple models one after another?
Use dependsOn to chain deployments. Each resource depends on the previous deployment completing before starting.
Networking & Private Service Connect
How do I enable Private Service Connect for my endpoint?
Configure endpointConfig.privateServiceConnectConfig with enablePrivateServiceConnect: true and projectAllowlists. For automated network setup, add pscAutomationConfigs with projectId and network.
Resource Management
Can I import existing Vertex AI endpoints into Pulumi?
No, this resource does not support import. You must create new endpoints through Pulumi.

Using a different cloud?

Explore analytics guides for other cloud providers: