Does this deploy model training or data pipelines?

No. The blueprint only deploys runtime infrastructure for invoking a managed foundation model from an HTTP endpoint.

Where do model identifiers and endpoints come from?

Each starter reads provider-specific Pulumi config. Defaults are safe examples where the provider has a public publisher model; account-specific endpoints stay in config or managed secrets.

Does the Azure variant create an Azure OpenAI account?

No. Azure OpenAI access, quota, and deployments are account-specific, so the starter stores references to an existing endpoint and deployment name in Key Vault and injects them into Azure Functions.

How are logs collected?

The runtime uses native platform logging. Lambda writes to CloudWatch Logs, Azure Functions writes through Application Insights, and Cloud Run writes to Cloud Logging.

How do I clean it up?

Run `pulumi destroy` from the same stack. Provider-managed log retention or externally supplied model deployments may need separate retention review.

Deploy AI app infrastructure on GCP Vertex AI with Pulumi

This solution deploys the infrastructure around a small HTTP AI application: a runtime endpoint, logs, secure configuration, and a runtime identity allowed to invoke one managed model service. It keeps the blueprint small so you can swap in your own handler code without first adopting an application framework.

Use it when you need a provider-native starting point for a production-facing AI request path, not a data or training platform. The app receives a JSON request, forwards the prompt to the selected managed model service, returns the model text, and writes platform logs for operations.

Architecture

The stack has four pieces:

An HTTP endpoint on Cloud Run.
A runtime identity with the smallest model-service permission practical for Vertex AI.
Secure configuration through Cloud Run environment variables.
Native request and application logs in Cloud Logging.

The generated app code is small. Replace the sample prompt handler with your application logic after the stack proves that identity, model access, logs, and endpoint routing work in your account.

GCP Vertex AI notes

This variant uses GCP Vertex AI as the managed model backend and keeps the application endpoint provider-native. Account-specific model access is configured outside source code.

Prerequisites

You need:

a Pulumi account and the Pulumi CLI
a Google Cloud project with Vertex AI and Cloud Run APIs enabled and permission to create Cloud Run, IAM, and logging-enabled service accounts
local cloud credentials for the selected provider

Node.js 20 or newer

Python 3.11 or newer

Download the blueprint

Use the Download blueprint button at the top of this page to grab the GCP Vertex AI zip for the language selected in the chooser. Each zip contains:

index.ts as the Pulumi entrypoint
components/ai-app.ts as the reusable component
runtime app code under the provider-specific folder
package.json and tsconfig.json for the Pulumi project

__main__.py as the Pulumi entrypoint
components/ai_app.py as the reusable component
runtime app code under the provider-specific folder
requirements.txt for the Pulumi project

Unzip, change into the directory, and continue with the quickstart below.

Quickstart

Install Pulumi project dependencies, configure the stack, and deploy. This solution currently ships TypeScript and Python starters.

# 1. Install Pulumi project dependencies
npm install

# 2. Initialize and configure the stack
pulumi stack init dev
pulumi config set gcp:project <your-gcp-project-id>
pulumi config set gcp:region us-central1
pulumi config set modelId publishers/google/models/gemini-2.5-flash

# 3. Deploy
pulumi up

# 1. Install Pulumi project dependencies
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# 2. Initialize and configure the stack
pulumi stack init dev
pulumi config set gcp:project <your-gcp-project-id>
pulumi config set gcp:region us-central1
pulumi config set modelId publishers/google/models/gemini-2.5-flash

# 3. Deploy
pulumi up

The default modelId targets a Gemini publisher model in us-central1. Cloud Run receives the project, region, and model ID as environment variables.

Walk through the stack

The component creates Cloud Run and exports endpointUrl so you can test the request path immediately after pulumi up. The sample handler accepts a JSON body with a prompt field and returns a JSON response with generated text or an operational error from the provider SDK.

Cloud Run uses a dedicated service account with roles/aiplatform.user so the app can invoke Vertex AI publisher models.

Keep prompts and generated output out of stack outputs. Runtime values belong in request bodies and platform logs, not Pulumi state.

Sample request handler

service/server.js

The Cloud Run service. It forwards the prompt to Vertex AI with the generative model client and returns the generated text. This same service/ directory ships in the TypeScript and Python starters.

import express from "express";
import { VertexAI } from "@google-cloud/vertexai";
const app = express();
app.use(express.json());
app.post("/", async (req, res) => {
  const vertex = new VertexAI({ project: process.env.GOOGLE_CLOUD_PROJECT, location: process.env.GOOGLE_CLOUD_LOCATION });
  const model = vertex.getGenerativeModel({ model: process.env.MODEL_ID });
  const result = await model.generateContent(req.body.prompt || "Say hello from Vertex AI.");
  res.json({ text: result.response.candidates?.[0]?.content?.parts?.[0]?.text || "" });
});
app.listen(process.env.PORT || 8080);

Operate the endpoint

After deployment:

curl -X POST "$(pulumi stack output endpointUrl)" \
  -H "content-type: application/json" \
  -d '{"prompt":"Write one sentence about infrastructure as code."}'

Open endpointUrl, send a small JSON body, and inspect Cloud Logging for the Cloud Run service.

Warning: The generated endpoint is public and unauthenticated by default. Anyone with the URL can call it, and leaked URLs can create model and token spend. Before production use, add auth, request validation, rate limits, and provider quota alarms.

This blueprint stays focused on runtime infrastructure and model invocation permissions.

Blueprint Pulumi program

The entrypoint reads stack config, creates the Vertex AI application component, and exports the HTTP endpoint plus observability handles.

import * as pulumi from "@pulumi/pulumi";
import { AiApp } from "./components/ai-app";
const config = new pulumi.Config();
const tags = { "pulumi:template": "ai-app-infrastructure", "pulumi:cloud": "gcp-vertex-ai", "pulumi:language": "typescript" };
const app = new AiApp("ai-app", { namePrefix: `${pulumi.getProject()}-${pulumi.getStack()}`, modelId: config.get("modelId") || "publishers/google/models/gemini-2.5-flash", tags });
export const endpointUrl = app.endpointUrl;
export const logResource = app.logResource;
export const runtimeIdentity = app.runtimeIdentity;

import pulumi
from components.ai_app import AiApp
config = pulumi.Config()
tags = {"pulumi:template": "ai-app-infrastructure", "pulumi:cloud": "gcp-vertex-ai", "pulumi:language": "python"}
app = AiApp("ai-app", name_prefix=f"{pulumi.get_project()}-{pulumi.get_stack()}", model_id=config.get("modelId") or "publishers/google/models/gemini-2.5-flash", tags=tags)
pulumi.export("endpointUrl", app.endpoint_url)
pulumi.export("logResource", app.log_resource)
pulumi.export("runtimeIdentity", app.runtime_identity)

Reusable AI application component

The component provisions Cloud Run, Cloud Logging, secure configuration, and least-privilege runtime access to Vertex AI.

components/ai-app.ts

Creates Cloud Run, Cloud Logging, secure config, and model invocation IAM for GCP Vertex AI.

import * as dockerbuild from "@pulumi/docker-build";
import * as gcp from "@pulumi/gcp";
import * as pulumi from "@pulumi/pulumi";
export interface AiAppArgs { namePrefix: string; modelId: string; tags: Record<string, string>; }
export class AiApp extends pulumi.ComponentResource {
  public readonly endpointUrl: pulumi.Output<string>; public readonly logResource: pulumi.Output<string>; public readonly runtimeIdentity: pulumi.Output<string>;
  constructor(name: string, args: AiAppArgs, opts?: pulumi.ComponentResourceOptions) {
    super("guides:aiAppInfrastructure:GcpVertexAi", name, {}, opts);
    const region = gcp.config.region || "us-central1";
    const project = gcp.config.project!;
    const repositoryId = `${args.namePrefix}-images`.replace(/[^a-z0-9-]/g, "-").slice(0, 63);
    const repository = new gcp.artifactregistry.Repository(`${name}-repo`, { location: region, repositoryId, format: "DOCKER", description: "Container images for the AI app starter" }, { parent: this });
    const repoUrl = pulumi.interpolate`${region}-docker.pkg.dev/${project}/${repository.repositoryId}`;
    const image = new dockerbuild.Image(`${name}-image`, { tags: [pulumi.interpolate`${repoUrl}/service:latest`], context: { location: "service" }, platforms: ["linux/amd64"], push: true }, { parent: this });
    const serviceAccount = new gcp.serviceaccount.Account(`${name}-sa`, { accountId: `${args.namePrefix}-ai`.replace(/[^a-z0-9-]/g, "").slice(0, 28), displayName: "AI app runtime" }, { parent: this });
    new gcp.projects.IAMMember(`${name}-vertex`, { project, role: "roles/aiplatform.user", member: pulumi.interpolate`serviceAccount:${serviceAccount.email}` }, { parent: this });
    const service = new gcp.cloudrunv2.Service(`${name}-service`, { name: `${args.namePrefix}-ai`, location: region, deletionProtection: false, ingress: "INGRESS_TRAFFIC_ALL", template: { serviceAccount: serviceAccount.email, containers: [{ image: image.ref, ports: { containerPort: 8080 }, envs: [{ name: "MODEL_ID", value: args.modelId }, { name: "GOOGLE_CLOUD_LOCATION", value: region }, { name: "GOOGLE_CLOUD_PROJECT", value: project }] }] }, traffics: [{ type: "TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST", percent: 100 }] }, { parent: this });
    new gcp.cloudrunv2.ServiceIamMember(`${name}-invoker`, { name: service.name, location: service.location, role: "roles/run.invoker", member: "allUsers" }, { parent: this });
    this.endpointUrl = service.uri; this.logResource = pulumi.interpolate`Cloud Logging service ${service.name}`; this.runtimeIdentity = serviceAccount.email; this.registerOutputs({ endpointUrl: this.endpointUrl, logResource: this.logResource, runtimeIdentity: this.runtimeIdentity });
  }
}

components/ai_app.py