Load GitHub events into Snowflake on GCP with batch COPY INTO

Switch variant

Choose a different cloud, loading mode, or webhook source.

Build a GCP-based webhook ingestion path that lands GitHub events in Snowflake with batch COPY INTO, plus blueprint downloads, reusable component code, and operating notes.

Download blueprint

Get this GCP + Batch COPY INTO + GitHub blueprint project as a zip. Switch Pulumi language here to keep the download aligned with the install commands and blueprint program on the page.

Download the Python blueprint with the matching Pulumi program, dependency files, and README.

Download Python blueprint

Download the TypeScript blueprint with the matching Pulumi program, dependency files, and README.

Download TypeScript blueprint

This guide shows how to land GitHub events in Snowflake on GCP with batch COPY INTO.

It covers the storage, eventing, and Snowflake objects for this setup so you can get raw events flowing first and shape downstream tables later.

On GCP, this guide uses Cloud Functions (2nd gen) for the public handler and Google Cloud Storage for the raw landing zone. The loading path for this variant (batch COPY INTO) carries those objects into Snowflake.

The batch COPY INTO path stages files first and leaves loading under your control. Choose it when you want to run COPY INTO on a predictable cadence and keep the raw payloads in object storage before Snowflake loads them. This blueprint provisions the public Cloud Function endpoint, the GCS landing zone, a Snowflake storage integration and external GCS stage, an X-Small warehouse, and a Snowflake task that runs COPY INTO every five minutes.

In the blueprint, the top-level program creates the Snowflake database, schema, and warehouse, then passes those names into the reusable batch component. That means you can keep the blueprint-created objects, rename them, or swap them for your own existing database, schema, and warehouse without rewriting the loading logic. The same entrypoint also sets taskIntervalMinutes, which controls how often the Snowflake task runs COPY INTO.

If you want to change those defaults before deploying, set stack config like this:

pulumi config set database LANDING_ZONE_WEBHOOKS
pulumi config set schema RAW
pulumi config set warehouse WEBHOOK_BATCH_LOADER
pulumi config set taskIntervalMinutes 60

This guide provisions the GitHub webhook for you and points it at the deployed public endpoint so repository events land in Snowflake on the first deploy.

Quickstart

  1. Download the blueprint zip for your language below, or create a new Pulumi project with the same file layout shown in the Download section.
  2. Install dependencies for your selected language and configure Snowflake plus GCP.
  3. For batch setups, decide whether you want to keep the blueprint database, schema, and warehouse names or point the program at names you already use. If needed, also change taskIntervalMinutes to the cadence you want.
  4. Deploy the stack to create the public Cloud Function endpoint, the GCS landing bucket, and the Snowflake loading objects.
  5. Register the webhook source and send a test event.
  6. Query the landing table in Snowflake to confirm the event arrived.

After the first test event, new rows usually appear within an hour because the Snowflake task runs on an hourly cadence.

Prerequisites

  • a Pulumi account and the Pulumi CLI
  • a Google Cloud project where you can create Cloud Functions, Cloud Storage buckets, Pub/Sub topics, and IAM bindings
  • a Snowflake account where you can create databases, schemas, and loading roles
  • a GitHub personal access token or app credential that can manage repository webhooks
  • the GitHub repository name you want to connect, set with pulumi config set webhook-repo <repo>

For the Pulumi language you selected:

Python 3.11 or newer and a virtual environment tool
Node.js 20 or newer and npm

Initialize your stack for GCP with:

pulumi stack init dev
pulumi config set gcp:project 123456789012
pulumi config set gcp:region us-central1

Set up credentials with Pulumi ESC

This guide needs cloud credentials, Snowflake credentials, and any source-specific token required to provision the webhook. A single ESC environment is usually the smallest setup that still keeps secrets out of local files.

values:
  gcp:
    login:
      fn::open::gcp-login:
        project: 123456789012
        oidc:
          workloadPoolId: pulumi-esc
          providerId: pulumi-esc
          serviceAccount: pulumi-esc@example-project.iam.gserviceaccount.com
  snowflake:
    login:
      fn::open::snowflake-login:
        oidc:
          account: <your-snowflake-account>
          user: ESC_SERVICE_USER
    organizationName: <your-org-name>
    accountName: <your-account-name>
  github:
    token:
      fn::secret: <your-github-token>
    owner: <your-github-org-or-user>
  environmentVariables:
    SNOWFLAKE_USER: ${snowflake.login.user}
    SNOWFLAKE_TOKEN: ${snowflake.login.token}
  pulumiConfig:
    snowflake:organizationName: ${snowflake.organizationName}
    snowflake:accountName: ${snowflake.accountName}
    snowflake:authenticator: OAUTH
    snowflake:role: PULUMI_DEPLOYER
    gcp:project: ${gcp.login.project}
    gcp:region: us-central1
    gcp:accessToken: ${gcp.login.accessToken}

    github:token: ${github.token}
    github:owner: ${github.owner}

Then reference it from your stack config:

environment:
  - <your-org>/<your-environment>
config:
  webhook-to-snowflake:database: LANDING_ZONE_WEBHOOKS

What you get in the download

The downloadable example zip includes:

  • Pulumi.yaml
  • the Pulumi program, dependency files, cloud runtime support files, and reusable components for the language you pick below
  • a README with a shorter quick start for this exact setup

For batch setups, the top-level program is where you choose the Snowflake database, schema, warehouse, and task cadence. The reusable batch component then builds the stage, destination table, and scheduled load inside those objects.

  • __main__.py as the Pulumi entrypoint
  • components/webhook_ingestion.py for the public webhook endpoint
  • components/batch_pipeline.py for the Snowflake loading path
  • cloudfunction/main.py for request validation and GCS writes
  • cloudfunction/requirements.txt for the Cloud Functions runtime dependencies
  • requirements.txt for the root Pulumi project
  • index.ts as the Pulumi entrypoint
  • components/webhook_ingestion.ts for the public webhook endpoint
  • components/batch_pipeline.ts for the Snowflake loading path
  • cloudfunction/main.py for request validation and GCS writes
  • cloudfunction/requirements.txt for the Cloud Functions runtime dependencies
  • package.json and tsconfig.json for the root Pulumi project

The next sections show the same entrypoint and component files that ship in the download.

Blueprint Pulumi program

This blueprint shows the full resource wiring for the GCP batch COPY INTO path with a GitHub source. The downloadable repo uses the same entrypoint and component files shown below.

import pulumi
import pulumi_gcp as gcp
import pulumi_random as random
import pulumi_snowflake as snowflake
import pulumi_github as github

from components.batch_pipeline import BatchPipeline
from components.webhook_ingestion import WebhookIngestion

config = pulumi.Config()
database_name = config.get("database") or "LANDING_ZONE_WEBHOOKS"
schema_name = config.get("schema") or "RAW"
warehouse_name = config.get("warehouse") or "WEBHOOK_BATCH_LOADER"
task_interval_minutes = config.get_int("taskIntervalMinutes") or 60
project = pulumi.Config("gcp").require("project")
region = pulumi.Config("gcp").require("region")
webhook_repo = config.require("webhook-repo")

landing_bucket = gcp.storage.Bucket(
    "landing-bucket",
    location=region,
    project=project,
    uniform_bucket_level_access=True,
    public_access_prevention="enforced",
)
database = snowflake.Database("landing-db", name=database_name)
schema = snowflake.Schema("raw-schema", name=schema_name, database=database.name)
warehouse = snowflake.Warehouse(
    "batch-loader-warehouse",
    name=warehouse_name,
    warehouse_size="XSMALL",
    auto_resume="true",
    auto_suspend=60,
    initially_suspended=False,
)
webhook_secret = random.RandomPassword("webhook-secret", length=32, special=False)

ingestion = WebhookIngestion(
    "source-webhooks",
    bucket_name=landing_bucket.name,
    project=project,
    region=region,
    webhook_secret=webhook_secret.result,
)

pipeline = BatchPipeline(
    "source-events",
    bucket_name=landing_bucket.name,
    database=database.name,
    schema_name=schema.name,
    warehouse_name=warehouse.name,
    task_interval_minutes=task_interval_minutes,
)

endpoint_url = ingestion.endpoint_url

github.RepositoryWebhook(
    "source-webhook",
    repository=webhook_repo,
    configuration=github.RepositoryWebhookConfigurationArgs(
        url=endpoint_url,
        content_type="json",
        secret=webhook_secret.result,
    ),
    events=["push", "pull_request", "issues", "star"],
)

pulumi.export("landing_bucket_name", landing_bucket.name)
pulumi.export("stage_name", pipeline.stage_name)
pulumi.export("table_name", pipeline.table_name)
pulumi.export("warehouse_name", warehouse.name)
pulumi.export("task_name", pipeline.task_name)
import * as gcp from "@pulumi/gcp";
import * as pulumi from "@pulumi/pulumi";
import * as random from "@pulumi/random";
import * as snowflake from "@pulumi/snowflake";
import * as github from "@pulumi/github";

import { BatchPipeline } from "./components/batch_pipeline";
import { WebhookIngestion } from "./components/webhook_ingestion";

const config = new pulumi.Config();
const databaseName = config.get("database") ?? "LANDING_ZONE_WEBHOOKS";
const schemaName = config.get("schema") ?? "RAW";
const warehouseConfigName = config.get("warehouse") ?? "WEBHOOK_BATCH_LOADER";
const taskIntervalMinutes = config.getNumber("taskIntervalMinutes") ?? 60;
const project = new pulumi.Config("gcp").require("project");
const region = new pulumi.Config("gcp").require("region");
const webhookRepo = config.require("webhook-repo");

const landingBucket = new gcp.storage.Bucket("landing-bucket", {
    location: region,
    project,
    uniformBucketLevelAccess: true,
    publicAccessPrevention: "enforced",
});
const database = new snowflake.Database("landing-db", { name: databaseName });
const schema = new snowflake.Schema("raw-schema", {
    name: schemaName,
    database: database.name,
});
const warehouse = new snowflake.Warehouse("batch-loader-warehouse", {
    name: warehouseConfigName,
    warehouseSize: "XSMALL",
    autoResume: "true",
    autoSuspend: 60,
    initiallySuspended: false,
});
const sharedSecret = new random.RandomPassword("webhook-secret", {
    length: 32,
    special: false,
});

const ingestion = new WebhookIngestion("source-webhooks", {
    bucketName: landingBucket.name,
    project,
    region,
    webhookSecret: sharedSecret.result,
});

const pipeline = new BatchPipeline("source-events", {
    bucketName: landingBucket.name,
    database: database.name,
    schemaName: schema.name,
    warehouseName: warehouse.name,
    taskIntervalMinutes,
});

const endpointUrl = ingestion.endpointUrl;

new github.RepositoryWebhook("source-webhook", {
    repository: webhookRepo,
    configuration: {
        url: endpointUrl,
        contentType: "json",
        secret: sharedSecret.result,
    },
    events: ["push", "pull_request", "issues", "star"],
});

export const landingBucketName = landingBucket.name;
export const stageName = pipeline.stageName;
export const tableName = pipeline.tableName;
export const warehouseName = warehouse.name;
export const taskName = pipeline.taskName;

Reusable components

The entrypoint stays small because the real ingestion work lives in reusable modules. These are the same component files packaged in the downloadable blueprint for this setup.

components/webhook_ingestion.py

Accepts the public webhook request, validates the signature, normalizes the payload, and writes the raw event into the landing path for this setup.

from __future__ import annotations

import base64
import hashlib
import tempfile
from dataclasses import dataclass
from pathlib import Path
from zipfile import ZIP_DEFLATED, ZipFile

import pulumi
import pulumi_gcp as gcp
import pulumi_random as random


def _create_function_archive(name: str) -> tuple[str, str]:
    archive_path = Path(tempfile.gettempdir()) / f"{name}-cloudfunction.zip"
    with ZipFile(archive_path, "w", compression=ZIP_DEFLATED) as archive:
        archive.write(Path("cloudfunction/main.py"), arcname="main.py")
        archive.write(Path("cloudfunction/requirements.txt"), arcname="requirements.txt")

    archive_bytes = archive_path.read_bytes()
    source_md5hash = base64.b64encode(hashlib.md5(archive_bytes).digest()).decode("utf-8")
    return str(archive_path), source_md5hash


@dataclass
class WebhookIngestion:
    endpoint_url: pulumi.Output[str]
    function_name: pulumi.Output[str]

    def __init__(
        self,
        name: str,
        *,
        bucket_name: pulumi.Input[str],
        project: pulumi.Input[str],
        region: pulumi.Input[str],
        webhook_secret: pulumi.Input[str],
    ) -> None:
        archive_path, source_md5hash = _create_function_archive(name)

        service_account_suffix = random.RandomString(
            f"{name}-service-account-suffix",
            length=8,
            special=False,
            upper=False,
        )
        service_account = gcp.serviceaccount.Account(
            f"{name}-service-account",
            account_id=service_account_suffix.result.apply(lambda value: f"w2sf-{value}"),
            display_name="Webhook ingestion function",
        )

        gcp.storage.BucketIAMMember(
            f"{name}-bucket-writer",
            bucket=bucket_name,
            role="roles/storage.objectCreator",
            member=service_account.email.apply(lambda email: f"serviceAccount:{email}"),
        )

        source_object = gcp.storage.BucketObject(
            f"{name}-source-object",
            bucket=bucket_name,
            name="deployments/cloudfunction-source.zip",
            source=pulumi.FileAsset(archive_path),
            source_md5hash=source_md5hash,
        )

        function = gcp.cloudfunctionsv2.Function(
            f"{name}-function",
            name=f"webhook-to-snowflake-{name}",
            location=region,
            build_config=gcp.cloudfunctionsv2.FunctionBuildConfigArgs(
                runtime="python311",
                entry_point="webhook",
                source=gcp.cloudfunctionsv2.FunctionBuildConfigSourceArgs(
                    storage_source=gcp.cloudfunctionsv2.FunctionBuildConfigSourceStorageSourceArgs(
                        bucket=bucket_name,
                        object=source_object.name,
                    )
                ),
            ),
            service_config=gcp.cloudfunctionsv2.FunctionServiceConfigArgs(
                available_memory="256M",
                timeout_seconds=30,
                ingress_settings="ALLOW_ALL",
                all_traffic_on_latest_revision=True,
                service_account_email=service_account.email,
                environment_variables={
                    "LANDING_BUCKET": bucket_name,
                    "LANDING_PREFIX": "incoming",
                    "WEBHOOK_SECRET": webhook_secret,
                },
            ),
        )

        gcp.cloudfunctionsv2.FunctionIamMember(
            f"{name}-function-invoker",
            project=project,
            location=function.location,
            cloud_function=function.name,
            role="roles/cloudfunctions.invoker",
            member="allUsers",
        )

        gcp.cloudrun.IamMember(
            f"{name}-run-invoker",
            project=project,
            location=function.location,
            service=function.name,
            role="roles/run.invoker",
            member="allUsers",
        )

        self.endpoint_url = function.service_config.uri
        self.function_name = function.name

components/batch_pipeline.py

Creates the Snowflake-side loading resources for this setup: the landing stage, the destination table, and the batch COPY INTO loading path.

from __future__ import annotations

from dataclasses import dataclass

import pulumi
import pulumi_gcp as gcp
import pulumi_snowflake as snowflake


def _copy_into_statement(database: pulumi.Input[str], schema_name: pulumi.Input[str]) -> pulumi.Output[str]:
    return pulumi.Output.all(database, schema_name).apply(
        lambda args: (
            f'COPY INTO "{args[0]}"."{args[1]}"."WEBHOOK_EVENTS" '
            f'FROM (SELECT metadata$filename, metadata$file_last_modified, $1, sysdate() '
            f'FROM @"{args[0]}"."{args[1]}"."WEBHOOK_EVENTS_STAGE") '
            "FILE_FORMAT = (TYPE = JSON)"
        )
    )


@dataclass
class BatchPipeline:
    stage_name: pulumi.Output[str]
    table_name: pulumi.Output[str]
    task_name: pulumi.Output[str]

    def __init__(
        self,
        name: str,
        *,
        bucket_name: pulumi.Input[str],
        database: pulumi.Input[str],
        schema_name: pulumi.Input[str],
        warehouse_name: pulumi.Input[str],
        task_interval_minutes: int,
    ) -> None:
        preview_provider = snowflake.Provider(
            f"{name}-preview-provider",
            preview_features_enabled=[
                "snowflakeStageExternalGcsResource",
                "snowflakeStorageIntegrationResource",
                "snowflakeTableResource",
            ],
        )
        preview_opts = pulumi.ResourceOptions(provider=preview_provider)

        table = snowflake.Table(
            f"{name}-table",
            database=database,
            schema=schema_name,
            name="WEBHOOK_EVENTS",
            columns=[
                snowflake.TableColumnArgs(name="FILENAME", type="STRING", nullable=False),
                snowflake.TableColumnArgs(
                    name="LAST_MODIFIED_AT",
                    type="TIMESTAMP_NTZ",
                    nullable=False,
                ),
                snowflake.TableColumnArgs(name="CONTENT", type="VARIANT"),
                snowflake.TableColumnArgs(name="LOADED_AT", type="TIMESTAMP_NTZ"),
            ],
            opts=preview_opts,
        )

        stage_url = pulumi.Output.from_input(bucket_name).apply(
            lambda current: f"gcs://{current}/incoming/"
        )
        storage_integration = snowflake.StorageIntegration(
            f"{name}-storage-integration",
            name="WEBHOOK_EVENTS_STORAGE_INTEGRATION",
            enabled=True,
            storage_provider="GCS",
            storage_allowed_locations=[stage_url],
            opts=preview_opts,
        )

        storage_member = storage_integration.storage_gcp_service_account.apply(
            lambda service_account: f"serviceAccount:{service_account}"
        )
        bucket_reader = gcp.storage.BucketIAMMember(
            f"{name}-bucket-reader",
            bucket=bucket_name,
            role="roles/storage.objectViewer",
            member=storage_member,
        )
        bucket_metadata_reader = gcp.storage.BucketIAMMember(
            f"{name}-bucket-metadata-reader",
            bucket=bucket_name,
            role="roles/storage.legacyBucketReader",
            member=storage_member,
        )

        stage = snowflake.StageExternalGcs(
            f"{name}-stage",
            database=database,
            schema=schema_name,
            name="WEBHOOK_EVENTS_STAGE",
            url=stage_url,
            storage_integration=storage_integration.name,
            opts=pulumi.ResourceOptions.merge(
                preview_opts,
                pulumi.ResourceOptions(depends_on=[bucket_reader, bucket_metadata_reader]),
            ),
        )

        task = snowflake.Task(
            f"{name}-task",
            database=database,
            schema=schema_name,
            name="WEBHOOK_EVENTS_TASK",
            warehouse=warehouse_name,
            started=True,
            schedule={"minutes": task_interval_minutes},
            sql_statement=_copy_into_statement(database, schema_name),
        )

        self.stage_name = stage.fully_qualified_name
        self.table_name = table.name
        self.task_name = task.name

components/webhook_ingestion.ts

Accepts the public webhook request, validates the signature, normalizes the payload, and writes the raw event into the landing path for this setup.

import AdmZip from "adm-zip";
import * as crypto from "crypto";
import * as fs from "fs";
import * as os from "os";
import * as path from "path";

import * as gcp from "@pulumi/gcp";
import * as pulumi from "@pulumi/pulumi";
import * as random from "@pulumi/random";

export interface WebhookIngestionArgs {
    bucketName: pulumi.Input<string>;
    project: pulumi.Input<string>;
    region: pulumi.Input<string>;
    webhookSecret: pulumi.Input<string>;
}

function createFunctionArchive(name: string): { archivePath: string; sourceMd5hash: string } {
    const zip = new AdmZip();
    zip.addLocalFile(path.join("cloudfunction", "main.py"), "", "main.py");
    zip.addLocalFile(path.join("cloudfunction", "requirements.txt"), "", "requirements.txt");

    const buffer = zip.toBuffer();
    const archivePath = path.join(os.tmpdir(), `${name}-cloudfunction.zip`);
    fs.writeFileSync(archivePath, buffer);

    return {
        archivePath,
        sourceMd5hash: crypto.createHash("md5").update(buffer).digest("base64"),
    };
}

export class WebhookIngestion {
    public readonly endpointUrl: pulumi.Output<string>;
    public readonly functionName: pulumi.Output<string>;

    constructor(name: string, args: WebhookIngestionArgs) {
        const { archivePath, sourceMd5hash } = createFunctionArchive(name);

        const serviceAccountSuffix = new random.RandomString(`${name}-service-account-suffix`, {
            length: 8,
            special: false,
            upper: false,
        });

        const serviceAccount = new gcp.serviceaccount.Account(`${name}-service-account`, {
            accountId: serviceAccountSuffix.result.apply((value) => `w2sf-${value}`),
            displayName: "Webhook ingestion function",
        });

        new gcp.storage.BucketIAMMember(`${name}-bucket-writer`, {
            bucket: args.bucketName,
            role: "roles/storage.objectCreator",
            member: serviceAccount.email.apply((email) => `serviceAccount:${email}`),
        });

        const sourceObject = new gcp.storage.BucketObject(`${name}-source-object`, {
            bucket: args.bucketName,
            name: "deployments/cloudfunction-source.zip",
            source: new pulumi.asset.FileAsset(archivePath),
            sourceMd5hash,
        });

        const fn = new gcp.cloudfunctionsv2.Function(`${name}-function`, {
            name: `webhook-to-snowflake-${name}`,
            location: args.region,
            buildConfig: {
                runtime: "python311",
                entryPoint: "webhook",
                source: {
                    storageSource: {
                        bucket: args.bucketName,
                        object: sourceObject.name,
                    },
                },
            },
            serviceConfig: {
                availableMemory: "256M",
                timeoutSeconds: 30,
                ingressSettings: "ALLOW_ALL",
                allTrafficOnLatestRevision: true,
                serviceAccountEmail: serviceAccount.email,
                environmentVariables: {
                    LANDING_BUCKET: args.bucketName,
                    LANDING_PREFIX: "incoming",
                    WEBHOOK_SECRET: args.webhookSecret,
                },
            },
        });

        new gcp.cloudfunctionsv2.FunctionIamMember(`${name}-function-invoker`, {
            project: args.project,
            location: fn.location,
            cloudFunction: fn.name,
            role: "roles/cloudfunctions.invoker",
            member: "allUsers",
        });

        new gcp.cloudrun.IamMember(`${name}-run-invoker`, {
            project: args.project,
            location: fn.location,
            service: fn.name,
            role: "roles/run.invoker",
            member: "allUsers",
        });

        this.endpointUrl = fn.serviceConfig.apply((sc) => sc?.uri ?? "");
        this.functionName = fn.name;
    }
}

components/batch_pipeline.ts

Creates the Snowflake-side loading resources for this setup: the landing stage, the destination table, and the batch COPY INTO loading path.

import * as gcp from "@pulumi/gcp";
import * as pulumi from "@pulumi/pulumi";
import * as snowflake from "@pulumi/snowflake";

export interface BatchPipelineArgs {
    bucketName: pulumi.Input<string>;
    database: pulumi.Input<string>;
    schemaName: pulumi.Input<string>;
    warehouseName: pulumi.Input<string>;
    taskIntervalMinutes: number;
}

function copyIntoStatement(database: pulumi.Input<string>, schemaName: pulumi.Input<string>) {
    return pulumi.all([database, schemaName]).apply(([currentDatabase, currentSchema]) =>
        `COPY INTO "${currentDatabase}"."${currentSchema}"."WEBHOOK_EVENTS" ` +
        `FROM (SELECT metadata$filename, metadata$file_last_modified, $1, sysdate() ` +
        `FROM @"${currentDatabase}"."${currentSchema}"."WEBHOOK_EVENTS_STAGE") ` +
        "FILE_FORMAT = (TYPE = JSON)",
    );
}

export class BatchPipeline {
    public readonly stageName: pulumi.Output<string>;
    public readonly tableName: pulumi.Output<string>;
    public readonly taskName: pulumi.Output<string>;

    constructor(name: string, args: BatchPipelineArgs) {
        const previewProvider = new snowflake.Provider(`${name}-preview-provider`, {
            previewFeaturesEnabled: [
                "snowflakeStageExternalGcsResource",
                "snowflakeStorageIntegrationResource",
                "snowflakeTableResource",
            ],
        });
        const previewOpts = { provider: previewProvider };

        const table = new snowflake.Table(`${name}-table`, {
            database: args.database,
            schema: args.schemaName,
            name: "WEBHOOK_EVENTS",
            columns: [
                { name: "FILENAME", type: "STRING", nullable: false },
                { name: "LAST_MODIFIED_AT", type: "TIMESTAMP_NTZ", nullable: false },
                { name: "CONTENT", type: "VARIANT" },
                { name: "LOADED_AT", type: "TIMESTAMP_NTZ" },
            ],
        }, previewOpts);

        const stageUrl = pulumi.output(args.bucketName).apply((bucketName) => `gcs://${bucketName}/incoming/`);
        const storageIntegration = new snowflake.StorageIntegration(`${name}-storage-integration`, {
            name: "WEBHOOK_EVENTS_STORAGE_INTEGRATION",
            enabled: true,
            storageProvider: "GCS",
            storageAllowedLocations: [stageUrl],
        }, previewOpts);

        const storageMember = storageIntegration.storageGcpServiceAccount.apply((serviceAccount) => `serviceAccount:${serviceAccount}`);
        const bucketReader = new gcp.storage.BucketIAMMember(`${name}-bucket-reader`, {
            bucket: args.bucketName,
            role: "roles/storage.objectViewer",
            member: storageMember,
        });
        const bucketMetadataReader = new gcp.storage.BucketIAMMember(`${name}-bucket-metadata-reader`, {
            bucket: args.bucketName,
            role: "roles/storage.legacyBucketReader",
            member: storageMember,
        });

        const stage = new snowflake.StageExternalGcs(`${name}-stage`, {
            database: args.database,
            schema: args.schemaName,
            name: "WEBHOOK_EVENTS_STAGE",
            url: stageUrl,
            storageIntegration: storageIntegration.name,
        }, pulumi.mergeOptions(previewOpts, { dependsOn: [bucketReader, bucketMetadataReader] }));

        const task = new snowflake.Task(`${name}-task`, {
            database: args.database,
            schema: args.schemaName,
            name: "WEBHOOK_EVENTS_TASK",
            warehouse: args.warehouseName,
            started: true,
            schedule: { minutes: args.taskIntervalMinutes },
            sqlStatement: copyIntoStatement(args.database, args.schemaName),
        });

        this.stageName = stage.fullyQualifiedName;
        this.tableName = table.name;
        this.taskName = task.name;
    }
}

Verify the data landed

After you send a test event, query Snowflake to confirm the records are visible:

SELECT FILENAME,
       LAST_MODIFIED_AT,
       CONTENT,
       LOADED_AT
FROM LANDING_ZONE_WEBHOOKS.RAW.WEBHOOK_EVENTS
ORDER BY LOADED_AT DESC;

For this path, payloads stay in GCS until the Snowflake task runs COPY INTO against the external stage.

Operating notes

  • Keep the first table as a raw landing zone. Flatten and model into downstream tables later.
  • Rotate the shared webhook secret when you roll senders or suspect exposure.
  • Watch the landing storage path and Snowflake task history so failed loads and malformed payloads do not go unnoticed.
  • Use a least-privilege Snowflake reader role for analysts instead of querying with the loading role.
  • When you choose batch loading, tune taskIntervalMinutes to match how quickly you want new files copied into Snowflake and how much warehouse activity you want between loads.

Frequently asked questions

When should I choose batch loading?
Choose batch loading when you want predictable load windows, lower always-on activity, or tighter control over when COPY INTO runs. This blueprint provisions a Snowflake task that runs once an hour so the path is still end to end. Tune taskIntervalMinutes if you want a tighter or looser cadence.
Can I keep the raw payloads in cloud storage?
Yes. Every path writes the raw payloads to cloud storage before Snowflake loads them (S3 on AWS, Blob Storage on Azure, Cloud Storage on GCP). See the variant page you picked for specifics on how the loading path reads from or retains those objects.