Deploying LLMs on GKE with NVIDIA GPUs & Google Cloud

Name: Deploying LLMs on GKE with NVIDIA GPUs & Google Cloud
Start: 2024-07-31T09:00:00-07:00

workshop · July 2024

Join us for a hands-on workshop where we’ll walk through deploying a large language model (LLM) on Google Kubernetes Engine (GKE) using NVIDIA GPUs and Pulumi’s modern Infrastructure as Code platform.

You’ll follow along as we deploy the open-weight Mixtral 8x7B model using Hugging Face’s Text Generation Inference and GKE’s powerful GPU-backed workloads. Learn how to provision GKE clusters with NVIDIA L4 GPUs, containerize AI models, and orchestrate everything with Pulumi and Python—all while leveraging Google Cloud’s scalable infrastructure.

Whether you’re building your own AI workloads or curious about how to manage LLMs in production-ready environments, this workshop will give you practical, real-world experience from two cloud and DevOps experts—featuring guest speaker Jason Smith from Google Cloud.

What you'll learn

How to configure Google Cloud for scalable AI workloads
Deploying infrastructure with Pulumi in Python
Managing GPU-enabled Kubernetes clusters
Serving and testing LLMs on GKE with Hugging Face Inference

This workshop uses Pulumi Cloud. Sign up to follow along.

Speakers

Engin Diri

Sr. Solutions Architect, Pulumi

Jason Smith

Sr. Cloud Customer Engineer, Google

Deploying LLMs on GKE with NVIDIA GPUs & Google Cloud

What you'll learn

The infrastructure as code platform for any cloud.