Day-two Operation of Multi-cloud Kubernetes and Vault

Session Information

Snowflake is multi-cloud. So is its infrastructure. For two years, Snowflake’s platform team has been building and operating 100 (and growing) Kubernetes clusters on AWS, Azure, and GCP. Today, we run on average a total of 60k Pods to unlock $7M annual savings.

We use Pulumi to provision cloud resources and manage HashiCorp Vault. In this talk, I will present how Pulumi has enabled Snowflake’s scale and growth:

  • How we leverage Pulumi Automation API to build custom rollout strategy for all Pulumi stacks
  • How we achieve blue-green upgrades for Kubernetes node pools
  • How we manage HashiCorp Vault using Pulumi:
    • rotating issuing certs (that signs Istio private gateway TLS cert)
    • static secrets (such as Teleport join-root token so k8s users could use one CLI to access all clusters and nodes)
    • cloud-provider secret engine (to generate scoped and short-lived tokens for services and automation)
  • How we generate and manage cloud-agnostic Kubernetes manifests by integrating with Pulumi stack outputs
  • How we use Pulumi Operator in CICD for auto-apply and audit

This talk differs from the one my colleagues did in the Cloud Engineering Summit 2020. They focused on the container platform design (logging, monitoring, networking, etc). I will lean more towards implementation and the day-2 experience of using Pulumi.

Presenters
  • Charles Xu
    Senior Software Engineer, Snowflake
  • Transcript coming soon.

Get started today

Pulumi is open source and free to get started. Deploy your first stack today.