This solution deploys the infrastructure around a small HTTP AI application: a runtime endpoint, logs, secure configuration, and a runtime identity allowed to invoke one managed model service. It keeps the blueprint small so you can swap in your own handler code without first adopting an application framework.
Use it when you need a provider-native starting point for a production-facing AI request path, not a data or training platform. The app receives a JSON request, forwards the prompt to the selected managed model service, returns the model text, and writes platform logs for operations.
Architecture
The stack has four pieces:
- An HTTP endpoint on Cloud Run.
- A runtime identity with the smallest model-service permission practical for Vertex AI.
- Secure configuration through Cloud Run environment variables.
- Native request and application logs in Cloud Logging.
The generated app code is small. Replace the sample prompt handler with your application logic after the stack proves that identity, model access, logs, and endpoint routing work in your account.
GCP Vertex AI notes
This variant uses GCP Vertex AI as the managed model backend and keeps the application endpoint provider-native. Account-specific model access is configured outside source code.
Prerequisites
You need:
- a Pulumi account and the Pulumi CLI
- a Google Cloud project with Vertex AI and Cloud Run APIs enabled and permission to create Cloud Run, IAM, and logging-enabled service accounts
- local cloud credentials for the selected provider
Download the blueprint
Use the Download blueprint button at the top of this page to grab the GCP Vertex AI zip for the language selected in the chooser. Each zip contains:
index.tsas the Pulumi entrypointcomponents/ai-app.tsas the reusable component- runtime app code under the provider-specific folder
package.jsonandtsconfig.jsonfor the Pulumi project
__main__.pyas the Pulumi entrypointcomponents/ai_app.pyas the reusable component- runtime app code under the provider-specific folder
requirements.txtfor the Pulumi project
Unzip, change into the directory, and continue with the quickstart below.
Quickstart
Install Pulumi project dependencies, configure the stack, and deploy. This solution currently ships TypeScript and Python starters.
# 1. Install Pulumi project dependencies
npm install
# 2. Initialize and configure the stack
pulumi stack init dev
pulumi config set gcp:project <your-gcp-project-id>
pulumi config set gcp:region us-central1
pulumi config set modelId publishers/google/models/gemini-2.5-flash
# 3. Deploy
pulumi up
# 1. Install Pulumi project dependencies
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# 2. Initialize and configure the stack
pulumi stack init dev
pulumi config set gcp:project <your-gcp-project-id>
pulumi config set gcp:region us-central1
pulumi config set modelId publishers/google/models/gemini-2.5-flash
# 3. Deploy
pulumi up
The default modelId targets a Gemini publisher model in us-central1. Cloud Run receives the project, region, and model ID as environment variables.
Walk through the stack
The component creates Cloud Run and exports endpointUrl so you can test the request path immediately after pulumi up. The sample handler accepts a JSON body with a prompt field and returns a JSON response with generated text or an operational error from the provider SDK.
Cloud Run uses a dedicated service account with roles/aiplatform.user so the app can invoke Vertex AI publisher models.
Keep prompts and generated output out of stack outputs. Runtime values belong in request bodies and platform logs, not Pulumi state.
Sample request handler
service/server.js
The Cloud Run service. It forwards the prompt to Vertex AI with the generative model client and returns the generated text. This same service/ directory ships in the TypeScript and Python starters.
import express from "express";
import { VertexAI } from "@google-cloud/vertexai";
const app = express();
app.use(express.json());
app.post("/", async (req, res) => {
const vertex = new VertexAI({ project: process.env.GOOGLE_CLOUD_PROJECT, location: process.env.GOOGLE_CLOUD_LOCATION });
const model = vertex.getGenerativeModel({ model: process.env.MODEL_ID });
const result = await model.generateContent(req.body.prompt || "Say hello from Vertex AI.");
res.json({ text: result.response.candidates?.[0]?.content?.parts?.[0]?.text || "" });
});
app.listen(process.env.PORT || 8080);
Operate the endpoint
After deployment:
curl -X POST "$(pulumi stack output endpointUrl)" \
-H "content-type: application/json" \
-d '{"prompt":"Write one sentence about infrastructure as code."}'
Open endpointUrl, send a small JSON body, and inspect Cloud Logging for the Cloud Run service.
Warning: The generated endpoint is public and unauthenticated by default. Anyone with the URL can call it, and leaked URLs can create model and token spend. Before production use, add auth, request validation, rate limits, and provider quota alarms.
This blueprint stays focused on runtime infrastructure and model invocation permissions.
Blueprint Pulumi program
The entrypoint reads stack config, creates the Vertex AI application component, and exports the HTTP endpoint plus observability handles.
import * as pulumi from "@pulumi/pulumi";
import { AiApp } from "./components/ai-app";
const config = new pulumi.Config();
const tags = { "pulumi:template": "ai-app-infrastructure", "pulumi:cloud": "gcp-vertex-ai", "pulumi:language": "typescript" };
const app = new AiApp("ai-app", { namePrefix: `${pulumi.getProject()}-${pulumi.getStack()}`, modelId: config.get("modelId") || "publishers/google/models/gemini-2.5-flash", tags });
export const endpointUrl = app.endpointUrl;
export const logResource = app.logResource;
export const runtimeIdentity = app.runtimeIdentity;
import pulumi
from components.ai_app import AiApp
config = pulumi.Config()
tags = {"pulumi:template": "ai-app-infrastructure", "pulumi:cloud": "gcp-vertex-ai", "pulumi:language": "python"}
app = AiApp("ai-app", name_prefix=f"{pulumi.get_project()}-{pulumi.get_stack()}", model_id=config.get("modelId") or "publishers/google/models/gemini-2.5-flash", tags=tags)
pulumi.export("endpointUrl", app.endpoint_url)
pulumi.export("logResource", app.log_resource)
pulumi.export("runtimeIdentity", app.runtime_identity)
Reusable AI application component
The component provisions Cloud Run, Cloud Logging, secure configuration, and least-privilege runtime access to Vertex AI.
components/ai-app.ts
Creates Cloud Run, Cloud Logging, secure config, and model invocation IAM for GCP Vertex AI.
import * as dockerbuild from "@pulumi/docker-build";
import * as gcp from "@pulumi/gcp";
import * as pulumi from "@pulumi/pulumi";
export interface AiAppArgs { namePrefix: string; modelId: string; tags: Record<string, string>; }
export class AiApp extends pulumi.ComponentResource {
public readonly endpointUrl: pulumi.Output<string>; public readonly logResource: pulumi.Output<string>; public readonly runtimeIdentity: pulumi.Output<string>;
constructor(name: string, args: AiAppArgs, opts?: pulumi.ComponentResourceOptions) {
super("guides:aiAppInfrastructure:GcpVertexAi", name, {}, opts);
const region = gcp.config.region || "us-central1";
const project = gcp.config.project!;
const repositoryId = `${args.namePrefix}-images`.replace(/[^a-z0-9-]/g, "-").slice(0, 63);
const repository = new gcp.artifactregistry.Repository(`${name}-repo`, { location: region, repositoryId, format: "DOCKER", description: "Container images for the AI app starter" }, { parent: this });
const repoUrl = pulumi.interpolate`${region}-docker.pkg.dev/${project}/${repository.repositoryId}`;
const image = new dockerbuild.Image(`${name}-image`, { tags: [pulumi.interpolate`${repoUrl}/service:latest`], context: { location: "service" }, platforms: ["linux/amd64"], push: true }, { parent: this });
const serviceAccount = new gcp.serviceaccount.Account(`${name}-sa`, { accountId: `${args.namePrefix}-ai`.replace(/[^a-z0-9-]/g, "").slice(0, 28), displayName: "AI app runtime" }, { parent: this });
new gcp.projects.IAMMember(`${name}-vertex`, { project, role: "roles/aiplatform.user", member: pulumi.interpolate`serviceAccount:${serviceAccount.email}` }, { parent: this });
const service = new gcp.cloudrunv2.Service(`${name}-service`, { name: `${args.namePrefix}-ai`, location: region, deletionProtection: false, ingress: "INGRESS_TRAFFIC_ALL", template: { serviceAccount: serviceAccount.email, containers: [{ image: image.ref, ports: { containerPort: 8080 }, envs: [{ name: "MODEL_ID", value: args.modelId }, { name: "GOOGLE_CLOUD_LOCATION", value: region }, { name: "GOOGLE_CLOUD_PROJECT", value: project }] }] }, traffics: [{ type: "TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST", percent: 100 }] }, { parent: this });
new gcp.cloudrunv2.ServiceIamMember(`${name}-invoker`, { name: service.name, location: service.location, role: "roles/run.invoker", member: "allUsers" }, { parent: this });
this.endpointUrl = service.uri; this.logResource = pulumi.interpolate`Cloud Logging service ${service.name}`; this.runtimeIdentity = serviceAccount.email; this.registerOutputs({ endpointUrl: this.endpointUrl, logResource: this.logResource, runtimeIdentity: this.runtimeIdentity });
}
}
components/ai_app.py
Creates Cloud Run, Cloud Logging, secure config, and model invocation IAM for GCP Vertex AI.
import re
import pulumi
import pulumi_docker_build as docker_build
import pulumi_gcp as gcp
class AiApp(pulumi.ComponentResource):
def __init__(self, name, name_prefix, model_id, tags, opts=None):
super().__init__("guides:aiAppInfrastructure:GcpVertexAi", name, None, opts)
child = pulumi.ResourceOptions(parent=self)
region = gcp.config.region or "us-central1"
project = gcp.config.project
repository_id = re.sub(r"[^a-z0-9-]", "-", f"{name_prefix}-images")[:63]
repository = gcp.artifactregistry.Repository(f"{name}-repo", location=region, repository_id=repository_id, format="DOCKER", description="Container images for the AI app starter", opts=child)
repo_url = pulumi.Output.concat(region, "-docker.pkg.dev/", project, "/", repository.repository_id)
image = docker_build.Image(f"{name}-image", tags=[pulumi.Output.concat(repo_url, "/service:latest")], context=docker_build.BuildContextArgs(location="service"), platforms=[docker_build.Platform.LINUX_AMD64], push=True, opts=child)
account_id = re.sub(r"[^a-z0-9-]", "", f"{name_prefix}-ai")[:28]
service_account = gcp.serviceaccount.Account(f"{name}-sa", account_id=account_id, display_name="AI app runtime", opts=child)
gcp.projects.IAMMember(f"{name}-vertex", project=project, role="roles/aiplatform.user", member=pulumi.Output.concat("serviceAccount:", service_account.email), opts=child)
service = gcp.cloudrunv2.Service(f"{name}-service", name=f"{name_prefix}-ai", location=region, deletion_protection=False, ingress="INGRESS_TRAFFIC_ALL", template=gcp.cloudrunv2.ServiceTemplateArgs(service_account=service_account.email, containers=[gcp.cloudrunv2.ServiceTemplateContainerArgs(image=image.ref, ports=gcp.cloudrunv2.ServiceTemplateContainerPortsArgs(container_port=8080), envs=[gcp.cloudrunv2.ServiceTemplateContainerEnvArgs(name="MODEL_ID", value=model_id), gcp.cloudrunv2.ServiceTemplateContainerEnvArgs(name="GOOGLE_CLOUD_LOCATION", value=region), gcp.cloudrunv2.ServiceTemplateContainerEnvArgs(name="GOOGLE_CLOUD_PROJECT", value=project)])]), traffics=[gcp.cloudrunv2.ServiceTrafficArgs(type="TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST", percent=100)], opts=child)
gcp.cloudrunv2.ServiceIamMember(f"{name}-invoker", name=service.name, location=service.location, role="roles/run.invoker", member="allUsers", opts=child)
self.endpoint_url = service.uri
self.log_resource = pulumi.Output.concat("Cloud Logging service ", service.name)
self.runtime_identity = service_account.email
self.register_outputs({"endpointUrl": self.endpoint_url, "logResource": self.log_resource, "runtimeIdentity": self.runtime_identity})