The gcp:vertex/aiEndpointWithModelGardenDeployment:AiEndpointWithModelGardenDeployment resource, part of the Pulumi GCP provider, creates a Vertex AI endpoint and deploys a Model Garden or Hugging Face model to it in a single operation. This guide focuses on three capabilities: deploying pre-trained models from catalogs, configuring dedicated GPU resources, and Private Service Connect networking.
Deployments require a GCP project with Vertex AI API enabled and may reference VPC networks for private connectivity. The examples are intentionally small. Combine them with your own monitoring, autoscaling, and access control configuration.
Deploy a Model Garden model to an endpoint
Teams deploying generative AI models often start with Google’s Model Garden catalog, which provides pre-trained models ready for inference without custom training.
import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";
const deploy = new gcp.vertex.AiEndpointWithModelGardenDeployment("deploy", {
publisherModelName: "publishers/google/models/paligemma@paligemma-224-float32",
location: "us-central1",
modelConfig: {
acceptEula: true,
},
});
import pulumi
import pulumi_gcp as gcp
deploy = gcp.vertex.AiEndpointWithModelGardenDeployment("deploy",
publisher_model_name="publishers/google/models/paligemma@paligemma-224-float32",
location="us-central1",
model_config={
"accept_eula": True,
})
package main
import (
"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/vertex"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)
func main() {
pulumi.Run(func(ctx *pulumi.Context) error {
_, err := vertex.NewAiEndpointWithModelGardenDeployment(ctx, "deploy", &vertex.AiEndpointWithModelGardenDeploymentArgs{
PublisherModelName: pulumi.String("publishers/google/models/paligemma@paligemma-224-float32"),
Location: pulumi.String("us-central1"),
ModelConfig: &vertex.AiEndpointWithModelGardenDeploymentModelConfigArgs{
AcceptEula: pulumi.Bool(true),
},
})
if err != nil {
return err
}
return nil
})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Gcp = Pulumi.Gcp;
return await Deployment.RunAsync(() =>
{
var deploy = new Gcp.Vertex.AiEndpointWithModelGardenDeployment("deploy", new()
{
PublisherModelName = "publishers/google/models/paligemma@paligemma-224-float32",
Location = "us-central1",
ModelConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs
{
AcceptEula = true,
},
});
});
package generated_program;
import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeployment;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeploymentArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;
public class App {
public static void main(String[] args) {
Pulumi.run(App::stack);
}
public static void stack(Context ctx) {
var deploy = new AiEndpointWithModelGardenDeployment("deploy", AiEndpointWithModelGardenDeploymentArgs.builder()
.publisherModelName("publishers/google/models/paligemma@paligemma-224-float32")
.location("us-central1")
.modelConfig(AiEndpointWithModelGardenDeploymentModelConfigArgs.builder()
.acceptEula(true)
.build())
.build());
}
}
resources:
deploy:
type: gcp:vertex:AiEndpointWithModelGardenDeployment
properties:
publisherModelName: publishers/google/models/paligemma@paligemma-224-float32
location: us-central1
modelConfig:
acceptEula: true
The publisherModelName property identifies the model using the format publishers/{publisher}/models/{model}@{version}. The modelConfig block requires acceptEula set to true, acknowledging the model’s license terms. Vertex AI provisions the endpoint and deploys the model automatically.
Deploy a Hugging Face model from the catalog
Hugging Face hosts thousands of open-source models that can be deployed directly through Vertex AI without manual container setup.
import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";
const deploy = new gcp.vertex.AiEndpointWithModelGardenDeployment("deploy", {
huggingFaceModelId: "Qwen/Qwen3-0.6B",
location: "us-central1",
modelConfig: {
acceptEula: true,
},
});
import pulumi
import pulumi_gcp as gcp
deploy = gcp.vertex.AiEndpointWithModelGardenDeployment("deploy",
hugging_face_model_id="Qwen/Qwen3-0.6B",
location="us-central1",
model_config={
"accept_eula": True,
})
package main
import (
"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/vertex"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)
func main() {
pulumi.Run(func(ctx *pulumi.Context) error {
_, err := vertex.NewAiEndpointWithModelGardenDeployment(ctx, "deploy", &vertex.AiEndpointWithModelGardenDeploymentArgs{
HuggingFaceModelId: pulumi.String("Qwen/Qwen3-0.6B"),
Location: pulumi.String("us-central1"),
ModelConfig: &vertex.AiEndpointWithModelGardenDeploymentModelConfigArgs{
AcceptEula: pulumi.Bool(true),
},
})
if err != nil {
return err
}
return nil
})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Gcp = Pulumi.Gcp;
return await Deployment.RunAsync(() =>
{
var deploy = new Gcp.Vertex.AiEndpointWithModelGardenDeployment("deploy", new()
{
HuggingFaceModelId = "Qwen/Qwen3-0.6B",
Location = "us-central1",
ModelConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs
{
AcceptEula = true,
},
});
});
package generated_program;
import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeployment;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeploymentArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;
public class App {
public static void main(String[] args) {
Pulumi.run(App::stack);
}
public static void stack(Context ctx) {
var deploy = new AiEndpointWithModelGardenDeployment("deploy", AiEndpointWithModelGardenDeploymentArgs.builder()
.huggingFaceModelId("Qwen/Qwen3-0.6B")
.location("us-central1")
.modelConfig(AiEndpointWithModelGardenDeploymentModelConfigArgs.builder()
.acceptEula(true)
.build())
.build());
}
}
resources:
deploy:
type: gcp:vertex:AiEndpointWithModelGardenDeployment
properties:
huggingFaceModelId: Qwen/Qwen3-0.6B
location: us-central1
modelConfig:
acceptEula: true
Instead of publisherModelName, use huggingFaceModelId with the format author/model-name. Vertex AI handles container packaging and deployment. The acceptEula requirement applies to Hugging Face models as well.
Configure dedicated compute resources for deployment
Production deployments require control over machine types, GPU accelerators, and replica counts to meet latency and throughput requirements.
import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";
const deploy = new gcp.vertex.AiEndpointWithModelGardenDeployment("deploy", {
publisherModelName: "publishers/google/models/paligemma@paligemma-224-float32",
location: "us-central1",
modelConfig: {
acceptEula: true,
},
deployConfig: {
dedicatedResources: {
machineSpec: {
machineType: "g2-standard-16",
acceleratorType: "NVIDIA_L4",
acceleratorCount: 1,
},
minReplicaCount: 1,
},
},
});
import pulumi
import pulumi_gcp as gcp
deploy = gcp.vertex.AiEndpointWithModelGardenDeployment("deploy",
publisher_model_name="publishers/google/models/paligemma@paligemma-224-float32",
location="us-central1",
model_config={
"accept_eula": True,
},
deploy_config={
"dedicated_resources": {
"machine_spec": {
"machine_type": "g2-standard-16",
"accelerator_type": "NVIDIA_L4",
"accelerator_count": 1,
},
"min_replica_count": 1,
},
})
package main
import (
"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/vertex"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)
func main() {
pulumi.Run(func(ctx *pulumi.Context) error {
_, err := vertex.NewAiEndpointWithModelGardenDeployment(ctx, "deploy", &vertex.AiEndpointWithModelGardenDeploymentArgs{
PublisherModelName: pulumi.String("publishers/google/models/paligemma@paligemma-224-float32"),
Location: pulumi.String("us-central1"),
ModelConfig: &vertex.AiEndpointWithModelGardenDeploymentModelConfigArgs{
AcceptEula: pulumi.Bool(true),
},
DeployConfig: &vertex.AiEndpointWithModelGardenDeploymentDeployConfigArgs{
DedicatedResources: &vertex.AiEndpointWithModelGardenDeploymentDeployConfigDedicatedResourcesArgs{
MachineSpec: &vertex.AiEndpointWithModelGardenDeploymentDeployConfigDedicatedResourcesMachineSpecArgs{
MachineType: pulumi.String("g2-standard-16"),
AcceleratorType: pulumi.String("NVIDIA_L4"),
AcceleratorCount: pulumi.Int(1),
},
MinReplicaCount: pulumi.Int(1),
},
},
})
if err != nil {
return err
}
return nil
})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Gcp = Pulumi.Gcp;
return await Deployment.RunAsync(() =>
{
var deploy = new Gcp.Vertex.AiEndpointWithModelGardenDeployment("deploy", new()
{
PublisherModelName = "publishers/google/models/paligemma@paligemma-224-float32",
Location = "us-central1",
ModelConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs
{
AcceptEula = true,
},
DeployConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentDeployConfigArgs
{
DedicatedResources = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentDeployConfigDedicatedResourcesArgs
{
MachineSpec = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentDeployConfigDedicatedResourcesMachineSpecArgs
{
MachineType = "g2-standard-16",
AcceleratorType = "NVIDIA_L4",
AcceleratorCount = 1,
},
MinReplicaCount = 1,
},
},
});
});
package generated_program;
import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeployment;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeploymentArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentDeployConfigArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentDeployConfigDedicatedResourcesArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentDeployConfigDedicatedResourcesMachineSpecArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;
public class App {
public static void main(String[] args) {
Pulumi.run(App::stack);
}
public static void stack(Context ctx) {
var deploy = new AiEndpointWithModelGardenDeployment("deploy", AiEndpointWithModelGardenDeploymentArgs.builder()
.publisherModelName("publishers/google/models/paligemma@paligemma-224-float32")
.location("us-central1")
.modelConfig(AiEndpointWithModelGardenDeploymentModelConfigArgs.builder()
.acceptEula(true)
.build())
.deployConfig(AiEndpointWithModelGardenDeploymentDeployConfigArgs.builder()
.dedicatedResources(AiEndpointWithModelGardenDeploymentDeployConfigDedicatedResourcesArgs.builder()
.machineSpec(AiEndpointWithModelGardenDeploymentDeployConfigDedicatedResourcesMachineSpecArgs.builder()
.machineType("g2-standard-16")
.acceleratorType("NVIDIA_L4")
.acceleratorCount(1)
.build())
.minReplicaCount(1)
.build())
.build())
.build());
}
}
resources:
deploy:
type: gcp:vertex:AiEndpointWithModelGardenDeployment
properties:
publisherModelName: publishers/google/models/paligemma@paligemma-224-float32
location: us-central1
modelConfig:
acceptEula: true
deployConfig:
dedicatedResources:
machineSpec:
machineType: g2-standard-16
acceleratorType: NVIDIA_L4
acceleratorCount: 1
minReplicaCount: 1
The deployConfig block specifies infrastructure. Inside dedicatedResources, machineSpec defines the machine type and GPU configuration: machineType sets the instance size, acceleratorType selects the GPU model (like NVIDIA_L4), and acceleratorCount determines how many GPUs attach to each replica. The minReplicaCount ensures at least one instance runs continuously.
Enable Private Service Connect for network isolation
Organizations with strict network policies use Private Service Connect to keep model inference traffic within their VPC without exposing public endpoints.
import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";
const deploy = new gcp.vertex.AiEndpointWithModelGardenDeployment("deploy", {
publisherModelName: "publishers/google/models/paligemma@paligemma-224-float32",
location: "us-central1",
modelConfig: {
acceptEula: true,
},
endpointConfig: {
privateServiceConnectConfig: {
enablePrivateServiceConnect: true,
projectAllowlists: ["my-project-id"],
},
},
});
import pulumi
import pulumi_gcp as gcp
deploy = gcp.vertex.AiEndpointWithModelGardenDeployment("deploy",
publisher_model_name="publishers/google/models/paligemma@paligemma-224-float32",
location="us-central1",
model_config={
"accept_eula": True,
},
endpoint_config={
"private_service_connect_config": {
"enable_private_service_connect": True,
"project_allowlists": ["my-project-id"],
},
})
package main
import (
"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/vertex"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)
func main() {
pulumi.Run(func(ctx *pulumi.Context) error {
_, err := vertex.NewAiEndpointWithModelGardenDeployment(ctx, "deploy", &vertex.AiEndpointWithModelGardenDeploymentArgs{
PublisherModelName: pulumi.String("publishers/google/models/paligemma@paligemma-224-float32"),
Location: pulumi.String("us-central1"),
ModelConfig: &vertex.AiEndpointWithModelGardenDeploymentModelConfigArgs{
AcceptEula: pulumi.Bool(true),
},
EndpointConfig: &vertex.AiEndpointWithModelGardenDeploymentEndpointConfigArgs{
PrivateServiceConnectConfig: &vertex.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigArgs{
EnablePrivateServiceConnect: pulumi.Bool(true),
ProjectAllowlists: pulumi.StringArray{
pulumi.String("my-project-id"),
},
},
},
})
if err != nil {
return err
}
return nil
})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Gcp = Pulumi.Gcp;
return await Deployment.RunAsync(() =>
{
var deploy = new Gcp.Vertex.AiEndpointWithModelGardenDeployment("deploy", new()
{
PublisherModelName = "publishers/google/models/paligemma@paligemma-224-float32",
Location = "us-central1",
ModelConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs
{
AcceptEula = true,
},
EndpointConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentEndpointConfigArgs
{
PrivateServiceConnectConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigArgs
{
EnablePrivateServiceConnect = true,
ProjectAllowlists = new[]
{
"my-project-id",
},
},
},
});
});
package generated_program;
import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeployment;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeploymentArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentEndpointConfigArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;
public class App {
public static void main(String[] args) {
Pulumi.run(App::stack);
}
public static void stack(Context ctx) {
var deploy = new AiEndpointWithModelGardenDeployment("deploy", AiEndpointWithModelGardenDeploymentArgs.builder()
.publisherModelName("publishers/google/models/paligemma@paligemma-224-float32")
.location("us-central1")
.modelConfig(AiEndpointWithModelGardenDeploymentModelConfigArgs.builder()
.acceptEula(true)
.build())
.endpointConfig(AiEndpointWithModelGardenDeploymentEndpointConfigArgs.builder()
.privateServiceConnectConfig(AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigArgs.builder()
.enablePrivateServiceConnect(true)
.projectAllowlists("my-project-id")
.build())
.build())
.build());
}
}
resources:
deploy:
type: gcp:vertex:AiEndpointWithModelGardenDeployment
properties:
publisherModelName: publishers/google/models/paligemma@paligemma-224-float32
location: us-central1
modelConfig:
acceptEula: true
endpointConfig:
privateServiceConnectConfig:
enablePrivateServiceConnect: true
projectAllowlists:
- my-project-id
The endpointConfig block contains privateServiceConnectConfig. Set enablePrivateServiceConnect to true and list allowed project IDs in projectAllowlists. Only traffic from those projects can reach the endpoint through private network connections.
Automate Private Service Connect network setup
When deploying multiple models with Private Service Connect, automating the network attachment configuration reduces manual VPC setup steps.
import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";
const network = new gcp.compute.Network("network", {
name: "network",
autoCreateSubnetworks: false,
});
const project = gcp.organizations.getProject({});
const deploy = new gcp.vertex.AiEndpointWithModelGardenDeployment("deploy", {
publisherModelName: "publishers/google/models/paligemma@paligemma-224-float32",
location: "us-central1",
modelConfig: {
acceptEula: true,
},
endpointConfig: {
privateServiceConnectConfig: {
enablePrivateServiceConnect: true,
projectAllowlists: [project.then(project => project.id)],
pscAutomationConfigs: {
projectId: project.then(project => project.id),
network: network.id,
},
},
},
});
const subnetwork = new gcp.compute.Subnetwork("subnetwork", {
name: "subnetwork",
ipCidrRange: "192.168.0.0/24",
region: "us-central1",
network: network.id,
});
import pulumi
import pulumi_gcp as gcp
network = gcp.compute.Network("network",
name="network",
auto_create_subnetworks=False)
project = gcp.organizations.get_project()
deploy = gcp.vertex.AiEndpointWithModelGardenDeployment("deploy",
publisher_model_name="publishers/google/models/paligemma@paligemma-224-float32",
location="us-central1",
model_config={
"accept_eula": True,
},
endpoint_config={
"private_service_connect_config": {
"enable_private_service_connect": True,
"project_allowlists": [project.id],
"psc_automation_configs": {
"project_id": project.id,
"network": network.id,
},
},
})
subnetwork = gcp.compute.Subnetwork("subnetwork",
name="subnetwork",
ip_cidr_range="192.168.0.0/24",
region="us-central1",
network=network.id)
package main
import (
"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/compute"
"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/organizations"
"github.com/pulumi/pulumi-gcp/sdk/v9/go/gcp/vertex"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)
func main() {
pulumi.Run(func(ctx *pulumi.Context) error {
network, err := compute.NewNetwork(ctx, "network", &compute.NetworkArgs{
Name: pulumi.String("network"),
AutoCreateSubnetworks: pulumi.Bool(false),
})
if err != nil {
return err
}
project, err := organizations.LookupProject(ctx, &organizations.LookupProjectArgs{}, nil)
if err != nil {
return err
}
_, err = vertex.NewAiEndpointWithModelGardenDeployment(ctx, "deploy", &vertex.AiEndpointWithModelGardenDeploymentArgs{
PublisherModelName: pulumi.String("publishers/google/models/paligemma@paligemma-224-float32"),
Location: pulumi.String("us-central1"),
ModelConfig: &vertex.AiEndpointWithModelGardenDeploymentModelConfigArgs{
AcceptEula: pulumi.Bool(true),
},
EndpointConfig: &vertex.AiEndpointWithModelGardenDeploymentEndpointConfigArgs{
PrivateServiceConnectConfig: &vertex.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigArgs{
EnablePrivateServiceConnect: pulumi.Bool(true),
ProjectAllowlists: pulumi.StringArray{
pulumi.String(project.Id),
},
PscAutomationConfigs: &vertex.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigPscAutomationConfigsArgs{
ProjectId: pulumi.String(project.Id),
Network: network.ID(),
},
},
},
})
if err != nil {
return err
}
_, err = compute.NewSubnetwork(ctx, "subnetwork", &compute.SubnetworkArgs{
Name: pulumi.String("subnetwork"),
IpCidrRange: pulumi.String("192.168.0.0/24"),
Region: pulumi.String("us-central1"),
Network: network.ID(),
})
if err != nil {
return err
}
return nil
})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Gcp = Pulumi.Gcp;
return await Deployment.RunAsync(() =>
{
var network = new Gcp.Compute.Network("network", new()
{
Name = "network",
AutoCreateSubnetworks = false,
});
var project = Gcp.Organizations.GetProject.Invoke();
var deploy = new Gcp.Vertex.AiEndpointWithModelGardenDeployment("deploy", new()
{
PublisherModelName = "publishers/google/models/paligemma@paligemma-224-float32",
Location = "us-central1",
ModelConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs
{
AcceptEula = true,
},
EndpointConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentEndpointConfigArgs
{
PrivateServiceConnectConfig = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigArgs
{
EnablePrivateServiceConnect = true,
ProjectAllowlists = new[]
{
project.Apply(getProjectResult => getProjectResult.Id),
},
PscAutomationConfigs = new Gcp.Vertex.Inputs.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigPscAutomationConfigsArgs
{
ProjectId = project.Apply(getProjectResult => getProjectResult.Id),
Network = network.Id,
},
},
},
});
var subnetwork = new Gcp.Compute.Subnetwork("subnetwork", new()
{
Name = "subnetwork",
IpCidrRange = "192.168.0.0/24",
Region = "us-central1",
Network = network.Id,
});
});
package generated_program;
import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.gcp.compute.Network;
import com.pulumi.gcp.compute.NetworkArgs;
import com.pulumi.gcp.organizations.OrganizationsFunctions;
import com.pulumi.gcp.organizations.inputs.GetProjectArgs;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeployment;
import com.pulumi.gcp.vertex.AiEndpointWithModelGardenDeploymentArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentModelConfigArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentEndpointConfigArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigArgs;
import com.pulumi.gcp.vertex.inputs.AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigPscAutomationConfigsArgs;
import com.pulumi.gcp.compute.Subnetwork;
import com.pulumi.gcp.compute.SubnetworkArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;
public class App {
public static void main(String[] args) {
Pulumi.run(App::stack);
}
public static void stack(Context ctx) {
var network = new Network("network", NetworkArgs.builder()
.name("network")
.autoCreateSubnetworks(false)
.build());
final var project = OrganizationsFunctions.getProject(GetProjectArgs.builder()
.build());
var deploy = new AiEndpointWithModelGardenDeployment("deploy", AiEndpointWithModelGardenDeploymentArgs.builder()
.publisherModelName("publishers/google/models/paligemma@paligemma-224-float32")
.location("us-central1")
.modelConfig(AiEndpointWithModelGardenDeploymentModelConfigArgs.builder()
.acceptEula(true)
.build())
.endpointConfig(AiEndpointWithModelGardenDeploymentEndpointConfigArgs.builder()
.privateServiceConnectConfig(AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigArgs.builder()
.enablePrivateServiceConnect(true)
.projectAllowlists(project.id())
.pscAutomationConfigs(AiEndpointWithModelGardenDeploymentEndpointConfigPrivateServiceConnectConfigPscAutomationConfigsArgs.builder()
.projectId(project.id())
.network(network.id())
.build())
.build())
.build())
.build());
var subnetwork = new Subnetwork("subnetwork", SubnetworkArgs.builder()
.name("subnetwork")
.ipCidrRange("192.168.0.0/24")
.region("us-central1")
.network(network.id())
.build());
}
}
resources:
deploy:
type: gcp:vertex:AiEndpointWithModelGardenDeployment
properties:
publisherModelName: publishers/google/models/paligemma@paligemma-224-float32
location: us-central1
modelConfig:
acceptEula: true
endpointConfig:
privateServiceConnectConfig:
enablePrivateServiceConnect: true
projectAllowlists:
- ${project.id}
pscAutomationConfigs:
projectId: ${project.id}
network: ${network.id}
subnetwork:
type: gcp:compute:Subnetwork
properties:
name: subnetwork
ipCidrRange: 192.168.0.0/24
region: us-central1
network: ${network.id}
network:
type: gcp:compute:Network
properties:
name: network
autoCreateSubnetworks: false
variables:
project:
fn::invoke:
function: gcp:organizations:getProject
arguments: {}
The pscAutomationConfigs block provisions the network attachment automatically. Specify projectId and reference a VPC network resource. Vertex AI creates the necessary service attachments without manual network configuration.
Beyond these examples
These snippets focus on specific endpoint deployment features: Model Garden and Hugging Face catalog deployment, dedicated compute resources and GPU configuration, and Private Service Connect networking. They’re intentionally minimal rather than full ML serving platforms.
The examples may reference pre-existing infrastructure such as GCP projects with Vertex AI API enabled, and VPC networks for PSC automation examples. They focus on configuring the deployment rather than provisioning surrounding infrastructure.
To keep things focused, common deployment patterns are omitted, including:
- Parallel vs sequential deployment orchestration (examples 4 and 5 show dependsOn patterns)
- Autoscaling configuration (maxReplicaCount)
- Model monitoring and logging setup
- Custom container images for non-catalog models
These omissions are intentional: the goal is to illustrate how each deployment feature is wired, not provide drop-in ML serving modules. See the Vertex AI Endpoint with Model Garden Deployment resource reference for all available configuration options.
Let's deploy GCP Vertex AI Models with Model Garden
Get started with Pulumi Cloud, then follow our quick setup guide to deploy this infrastructure.
Try Pulumi Cloud for FREEFrequently Asked Questions
Model Selection & Configuration
publisherModelName for Model Garden models from Google or Meta publishers (format: publishers/{publisher}/models/{model}@{version}), and huggingFaceModelId for Hugging Face models (format: author/model-name). These are mutually exclusive options.modelConfig.acceptEula: true, indicating EULA acceptance is required for deployment.Deployment Configuration
deployConfig.dedicatedResources to specify machineType, acceleratorType, acceleratorCount, and minReplicaCount. For example, g2-standard-12 with NVIDIA_L4 accelerators.location, project, deployConfig, endpointConfig, huggingFaceModelId, modelConfig, and publisherModelName.Multiple Model Deployments
dependsOn relationships. Each deployment will proceed independently in parallel.dependsOn to chain deployments. Each resource depends on the previous deployment completing before starting.Networking & Private Service Connect
endpointConfig.privateServiceConnectConfig with enablePrivateServiceConnect: true and projectAllowlists. For automated network setup, add pscAutomationConfigs with projectId and network.Resource Management
Using a different cloud?
Explore analytics guides for other cloud providers: