Events

All on-demand recordings
WORKSHOP ON-DEMAND

Deploying LLMs on GKE with NVIDIA GPUs & Google Cloud

This workshop is taught with Pulumi Cloud. Sign up for free to follow along.

Join us for a hands-on workshop where we’ll walk through deploying a large language model (LLM) on Google Kubernetes Engine (GKE) using NVIDIA GPUs and Pulumi’s modern Infrastructure as Code platform.

You’ll follow along as we deploy the open-weight Mixtral 8x7B model using Hugging Face’s Text Generation Inference and GKE’s powerful GPU-backed workloads. Learn how to provision GKE clusters with NVIDIA L4 GPUs, containerize AI models, and orchestrate everything with Pulumi and Python—all while leveraging Google Cloud’s scalable infrastructure.

Whether you’re building your own AI workloads or curious about how to manage LLMs in production-ready environments, this workshop will give you practical, real-world experience from two cloud and DevOps experts—featuring guest speaker Jason Smith from Google Cloud.

You'll Learn:
How to configure Google Cloud for scalable AI workloads
Deploying infrastructure with Pulumi in Python
Managing GPU-enabled Kubernetes clusters
Serving and testing LLMs on GKE with Hugging Face Inference
Event Speakers
Engin Diri
Sr. Solutions Architect, Pulumi
Jason Smith
Sr. Cloud Customer Engineer, Google