Kubernetes playbooks

Overview

The steps to follow include:

Production Architecture for Teams

Misconfigured infrastructure accounts are the source of a significant number of serious production outages. Many of these failure modes are preventable.

The primary objective of this reference architecture is to create sensible defaults that reduce the likelihood of these errors. Modern infrastructure as code tools, such as Pulumi, are an effective means of accomplishing this goal.

Pulumi tools allow engineering teams to share specifications for what their infrastructure should look like and allow teams to reliably provision and manage infrastructure. Changes to infrastructure can be audited as part of code review and they allow teams to detect drift.

This architecture is meant to show how these tools can be used within a team to employ and understand:

Security: Who has access to what, and how is this policy enforced?
Governance: How do we ensure the blast radius of changes is as small as possible?
Engineering: How do we automate this with CI/CD?

Production Infrastructure as Code

At the core of this architecture is a simple idea: that we should separate resources into loosely-coupled, independently-manageable sets, based on risk and functionality.

We suggest splitting infrastructure up into (roughly) six Pulumi stacks of resources.

1. Identity

Identities and role definitions for organizations and CI/CD are required before anyone can provision anything. This is a requirement for every production Kubernetes deployment.

By isolating resources into loosely-coupled stacks, we can grant minimal permissions based on the principle of least privilege.

The identity stack typically contains:

Identities and roles for the team e.g. AWS IAM, Google Cloud IAM, Azure AD.
For example, the database team typically gets only administrative permissions for the datastores, while an app team might only get cluster developer permissions.
Service Accounts for bots and CI/CD.
While IAM roles and Active Directory accounts describe identity of users, service accounts grant an identity for workloads, e.g., Storage CI/CD.

2. Managed Infrastructure

Provisioning shared, managed infrastructure is required to configure the cluster.

At a minimum, this typically includes networking infrastructure, and can often include storage backends along with other cloud services such as VMs, registries, data pipelines, and data warehouses.

3. Kubernetes Cluster

Configure and provision the Kubernetes cluster with the desired settings and defaults.

This also typically involves provisioning the Kubernetes cluster infrastructure with API resources such as Namespaces, Roles , RoleBindings, and Quotas.

Using a managed Kubernetes cluster on EKS, GKE, or AKS is the easiest way to deploy a cluster.

4. Cluster Services

With a vanilla cluster running, you can install any Kubernetes cluster-scoped services that will be shared by some or all cluster users.

At a minimum, services that should be installed include centralized cluster and app-based logging, and often include monitoring, policies, and service meshes.

5. App Services

Configure any Kubernetes app-scoped services that will be shared with users using deployment permissions.

App services tend to include managed datastores (e.g. RDS, Cloud SQL, and CosmosDB), ingress controllers, DNS managers, TLS certificate managers, and app pipelines.

6. Apps

Deploy applications and workloads into the cluster.

Kubernetes playbooks

On this page

On this page

Overview

Production Architecture for Teams

Production Infrastructure as Code

1. Identity

2. Managed Infrastructure

3. Kubernetes Cluster

4. Cluster Services

5. App Services

6. Apps

On this page

On this page