Day-two Operation of Multi-cloud Kubernetes and Vault
talk
·
October 2021
Snowflake is multi-cloud. So is its infrastructure. For two years, Snowflake’s platform team has been building and operating 100 (and growing) Kubernetes clusters on AWS, Azure, and Google Cloud. Today, we run on average a total of 60k Pods to unlock $7M annual savings.
We use Pulumi to provision cloud resources and manage HashiCorp Vault. In this talk, I will present how Pulumi has enabled Snowflake’s scale and growth:
- How we leverage Pulumi Automation API to build custom rollout strategy for all Pulumi stacks
- How we achieve blue-green upgrades for Kubernetes node pools
- How we manage HashiCorp Vault using Pulumi:
- rotating issuing certs (that signs Istio private gateway TLS cert)
- static secrets (such as Teleport join-root token so k8s users could use one CLI to access all clusters and nodes)
- cloud-provider secret engine (to generate scoped and short-lived tokens for services and automation)
- How we generate and manage cloud-agnostic Kubernetes manifests by integrating with Pulumi stack outputs
- How we use Pulumi Operator in CICD for auto-apply and audit
This talk differs from the one my colleagues did in the Cloud Engineering Summit 2020. They focused on the container platform design (logging, monitoring, networking, etc). I will lean more towards implementation and the day-2 experience of using Pulumi.
Speakers
Charles Xu
Senior Software Engineer, Snowflake