Pulumi Kubernetes Operator 2.0
Posted on
A few years ago we released the Pulumi Kubernetes Operator, a cloud-native way to manage and deploy cloud infrastructure using Pulumi from within your Kubernetes environment. We’ve heard your feedback about limitations related to scalability and isolation. Today, we’re excited to announce version 2.0 beta 1 of the Pulumi Kubernetes Operator. We’ve put a new, horizontally scalable architecture in place along with a variety of new security features and customization options. Let’s dig in!
What is the Pulumi Kubernetes Operator?
The Pulumi Kubernetes Operator defines a Kubernetes Custom Resource called pulumi.com/v1/Stack
, which represents a Pulumi stack. The Pulumi stack can be authored in any supported Pulumi language (TypeScript, Python, Go, .NET, Java, YAML) and can deploy and manage cloud infrastructure in any supported cloud (AWS, Azure, GCP, Kubernetes and 60+ additional cloud and SaaS providers). The Pulumi Kubernetes Operator triggers cloud deployments based on changes to the Stack
Custom Resource or the resources it uses.
As a result, the Pulumi Kubernetes Operator enables users to specify the desired state of their cloud infrastructure by using resources managed directly in their Kubernetes cluster. Modifying those Kubernetes resources will trigger creation, updates, and deletion of the underlying cloud infrastructure that they manage.
For example, the Operator can use the Stack
resource below to run a Pulumi program, as defined in a GitHub repository, to deploy NGINX. Subsequent commits to the repository, or changes to the stack’s configuration, will cause the Operator to re-deploy the infrastructure.
apiVersion: pulumi.com/v1
kind: Stack
metadata:
name: nginx-k8s-stack-production
spec:
stack: acmecorp/nginx/production
projectRepo: https://github.com/acmecorp/pulumi-nginx
commit: 2b0889718d3e63feeb6079ccd5e4488d8601e353
The Pulumi Kubernetes Operator can also be used along with Pulumi’s Kubernetes Provider, which allows Pulumi programs to deploy resources to Kubernetes – either in the same cluster as the Operator, or in another cluster. This can be used to package up complex Kubernetes infrastructure into more abstracted units, to mix Kubernetes and cloud infrastructure into a single unit of deployment, to define how application workloads are deployed into a cluster, or for a wide variety of other use cases. The Pulumi Kubernetes Provider has full support for the Kubernetes API available in every Pulumi language, including support for Helm.
What’s New in Pulumi Kubernetes Operator 2.0?
The 2.0 release is based on a whole new architecture for running Pulumi programs in your Kubernetes cluster.
The Operator now allocates a dedicated pod for each Stack
to serve as the execution environment for Pulumi stack operations.
Previously, all stack operations took place within the Operator’s own pod. This new approach effectively isolates
each stack’s compute and memory resources, improves the isolation of secrets, and opens up new customization options.
These pods are referred to as workspace pods. Under the hood, a new Workspace
Custom Resource handles the lifecycle
of the workspace pod, and is created automatically by the system. A related Update
Custom Resource applies a stack operation to a given workspace.
The Stack
Custom Resource is still the primary API for using the Pulumi Kubernetes Operator and is largely unchanged.
See the Migration section below to understand how to prepare your Stack
objects for use with the 2.0 release.
Note: a key limitation of the new system is that cross-namespace references are strictly forbidden. For example, a Stack
cannot refer
to a Secret
in another namespace.
Installation
With the 2.0 release, the supported installation modes are limited to a cluster-wide installation. You’ll need to be a cluster admin to install the Operator. The Operator is able to handle stacks across all the namespaces of your cluster.
We provide three ways to install the Operator:
- a Pulumi program (see:
deploy/pulumi-operator-yaml/
) - a Helm chart (see:
deploy/helm/pulumi-operator/
) - a simple manifest file (see:
deploy/yaml/install.yaml
)
Please uninstall any earlier version of the Pulumi Kubernetes Operator, then install the new version using one of the above options.
Note that this will remove any existing Stack
objects in your cluster, so be sure to export their manifests or have
a way to restore them.
Security
To protect your Kubernetes cluster, workspace pods are configured to have minimal permissions as defined by the ‘restricted’ profile of the Pod Security Standards. If necessary, you may configure a stack to use the ‘baseline’ profile.
A Kubernetes ServiceAccount
provides an identity for your workspace pod, and we recommend that you create
a service account for your stack rather than using the default
account. If your Pulumi program uses the
Kubernetes provider to manage resources within the cluster, the stack’s service account will need permissions, e.g.
a RoleBinding
to the ClusterRole
named cluster-admin
. See “Service Accounts” for more information.
The workspace pod has a RPC endpoint that is used by the Operator to run stack operations. The pod protects the
RPC endpoint using Kubernetes RBAC, and for that reason
you must bind the ClusterRole
named system:auth-delegator
to the stack’s service account.
Scalability
Each stack is run in its own “workspace” pod, and so the system scales horizontally to support many stacks.
The workspace pod is, by default, configured to use minimal resources when idle, and to ‘burst’ into the unused
memory and compute resources of the node during a stack operation. You may customize the resource constraints to
fit the needs of your stack by using the workspaceTemplate
field. If your Kubernetes cluster has high utilization,
we recommend increasing the resource requests and limits.
See “Pod Quality of Service Classes” for more information.
The pod is not discarded at the end of a stack operation, instead it remains in-place unless the stack is deleted. We’re considering introducing the option to clean up the workspace pod after each operation to enable you to make a trade-off decision between performance and efficiency. Please let us know if this is something you’d like to see!
Stability
A Kubernetes cluster is a dynamic environment where a given pod may be terminated at any time, e.g. due to maintenance operations,
unplanned node events, or resource pressure. This can cause problems for ongoing stack operations, and historically
may cause a stack to become “locked”, necessitating manual intervention with the use of pulumi cancel
.
The new system has been made more resilient by being responsive to pod termination requests. It responds by proactively sending CTRL-C to the ongoing stack operation, to gracefully wind down the operation.
If your stack has a lot of resources, consider increasing the pod’s termination grace period (default: 10 minutes).
Customization and Extensibility
Let’s discuss how to customize the stack’s environment. The Stack
resource continues to provide ways to set environment variables
and to set stack configuration values. What’s new is the ability to:
- use a custom Docker image for your stack environment
- set the stack environment’s compute and storage resources
- attach volumes, init containers, and sidecars
This is accomplished using the workspaceTemplate
field of the Stack
resource, to customize the Workspace
object
that is automatically created by the system for a given stack.
By default, the system uses the official pulumi/pulumi Docker image.
You may wish to create your own Docker image that is based on it, e.g. to pre-install your program’s dependencies
(using pulumi install
) for performance reasons. Be prepared to update the image over time to incorporate the latest version of the Pulumi CLI.
One use case for an extra init container is to customize how your program’s sources are obtained. The system supports
Git repositories and Flux sources out-of-the-box, but you have the option of handling it yourself, e.g. to generate
a Pulumi project or to download from another source. You simply inject an init container, mount the share
volume to /share
,
and then place the project files into /share/workspace
. Contact us to learn more.
Here’s an example of a stack with some customizations applied:
apiVersion: pulumi.com/v1
kind: Stack
metadata:
name: my-stack
spec:
...
workspaceTemplate:
spec:
image: example/pulumi-with-dependencies:v3.334
podTemplate:
metadata:
labels:
example.com/mylabel: bar
spec:
terminationGracePeriodSeconds: 3600
tolerations:
- key: "example.com/foo"
operator: "Exists"
effect: "NoSchedule"
initContainers:
- name: extra # add an extra init container
image: busybox
command: ["sh", "-c", "echo 'Hello, extra init container!'"]
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
volumeMounts:
- name: share
mountPath: /share
containers:
- name: pulumi # customize the main container where Pulumi stack operations are executed
volumeMounts:
- name: oidc-token
mountPath: /var/run/secrets/pulumi
volumes:
- name: oidc-token # mount an OIDC token
projected:
sources:
- serviceAccountToken:
audience: urn:pulumi:org:ORG_NAME
path: token
expirationSeconds: 3600
Operability
Since each stack has its own long-running pod, you may inspect the console output for a given stack by accessing
the pod logs. The pod’s name is based on the stack name; for example, given a stack named my-stack
, look for
a pod named my-stack-workspace-0
. We recommend the stern tool to access pod logs.
See “Pod and container logs” for more information.
If you need to run an interactive Pulumi command for your stack, e.g. pulumi import
, exec into the workspace pod. Navigate to the
/share/workspace
directory and you should find your program there.
Walkthrough
Here’s a quick demonstration of the new system. Let’s deploy the random-yaml program from the pulumi/examples repository.
We’ll be using Flux for this demonstration; to follow along, install Flux (see “Installation”).
Setup the GitRepository Source
For this demonstration, we’ll use Flux to clone and cache the pulumi/examples repository, since it is a large repository and the Flux cache provides a big performance improvement. The system works well without Flux too.
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: pulumi-examples
namespace: default
spec:
interval: 24h
ref:
branch: master
url: https://github.com/pulumi/examples
Install a Pulumi Access Token
Create a Secret
containing a Pulumi access token to be used by the stack to authenticate to Pulumi Cloud.
Follow these instructions to create a
personal, organization, or team access token.
Store the access token into a Kubernetes Secret. Here’s an easy way to create a Secret named pulumi-api-secret
.
Replace TOKEN
with the token that you created.
$ kubectl create secret generic pulumi-api-secret -n default --from-literal=accessToken=TOKEN
Create a Service Account
The stack requires a Kubernetes service account with a binding to the system:auth-delegator
cluster role.
If you don’t have admin permissions in your Kubernetes cluster, ask your administrator to create the service account for you.
The Pulumi Kubernetes Operator installer also pre-creates a usable ServiceAccount named pulumi
in the default
namespace.
apiVersion: v1
kind: ServiceAccount
metadata:
name: random-yaml
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: random-yaml:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: random-yaml
namespace: default
Create the Stack
Let’s apply a Stack
object to deploy the program. Note that the specification makes reference to
the GitRepository
(pulumi-examples
), the Secret
(pulumi-api-secret
), and the ServiceAccount
(random-yaml
) created earlier.
apiVersion: pulumi.com/v1
kind: Stack
metadata:
name: random-yaml
namespace: default
spec:
serviceAccountName: random-yaml
fluxSource:
sourceRef:
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
name: pulumi-examples
dir: random-yaml/
stack: dev
refresh: true
continueResyncOnCommitMatch: true
resyncFrequencySeconds: 300
destroyOnFinalize: true
envRefs:
PULUMI_ACCESS_TOKEN:
type: Secret
secret:
name: pulumi-api-secret
key: accessToken
This stack is configured to deploy immediately and to resync every five minutes. The term “resync” means to execute
pulumi up
, which is useful for stacks that produce resources or outputs that require a periodic update. Without resync,
pulumi up
runs whenever a new commit is pushed to the repository.
Observe the Workspace
The system automatically creates a Workspace
object named random-yaml
to provision an execution environment.
Let’s watch as the workspace’s status progresses towards readiness. For this demonstration, we’ll use the --watch
flag
to observe the status updates:
$ kubectl get workspace --watch
NAME IMAGE READY ADDRESS
random-yaml pulumi/pulumi:3.134.1-nonroot
random-yaml pulumi/pulumi:3.134.1-nonroot False
random-yaml pulumi/pulumi:3.134.1-nonroot False random-yaml-workspace.default.svc.cluster.local:50051
random-yaml pulumi/pulumi:3.134.1-nonroot False random-yaml-workspace.default.svc.cluster.local:50051
random-yaml pulumi/pulumi:3.134.1-nonroot True random-yaml-workspace.default.svc.cluster.local:50051
Observe the Updates
Once the workspace is ready, the system proceeds to run a periodic update. Each update is represented by
an Update
object. Let’s watch the progression across three updates:
$ kubectl get update --watch
NAME WORKSPACE PROGRESSING FAILED COMPLETE URL
random-yaml-5kmn7dcm random-yaml
random-yaml-5kmn7dcm random-yaml True False False
random-yaml-5kmn7dcm random-yaml False False True https://app.pulumi.com/eron-pulumi-corp/random/dev/updates/1
random-yaml-rftp8gj8 random-yaml
random-yaml-rftp8gj8 random-yaml True False False
random-yaml-rftp8gj8 random-yaml False False True https://app.pulumi.com/eron-pulumi-corp/random/dev/updates/2
random-yaml-sjpw5hm8 random-yaml
random-yaml-sjpw5hm8 random-yaml True False False
random-yaml-sjpw5hm8 random-yaml False False True https://app.pulumi.com/eron-pulumi-corp/random/dev/updates/3
Observe the Stack
The Stack
object maintains a status block showing the result of the latest attempt and whether an update
is currently running. Taking a look, we see a successful outcome.
$ kubectl get stack random-yaml -oyaml
apiVersion: pulumi.com/v1
kind: Stack
...
status:
conditions:
- lastTransitionTime: "2024-10-17T20:41:49Z"
message: the stack has been processed and is up to date
reason: ProcessingCompleted
status: "True"
type: Ready
lastUpdate:
failures: 0
generation: 2
lastAttemptedCommit: sha256:44d554df090dcdeb9bae908cd38155c8c93db05f64dc72982027e8e1294af0d3
lastResyncTime: "2024-10-17T20:41:49Z"
lastSuccessfulCommit: sha256:44d554df090dcdeb9bae908cd38155c8c93db05f64dc72982027e8e1294af0d3
name: random-yaml-sjpw5hm8
permalink: https://app.pulumi.com/eron-pulumi-corp/random/dev/updates/3
state: succeeded
type: up
observedGeneration: 2
outputs:
password: '[secret]'
Delete the Stack
To complete the demonstration, let’s delete the Stack
object. Because spec.destroyOnFinalize
is enabled,
deleting the object causes the Pulumi stack to be destroyed. That is, the stack resources are deleted.
$ kubectl delete stack random-yaml
stack.pulumi.com "random-yaml" deleted
$ kubectl get workspace
No resources found in default namespace.
$ kubectl get update
No resources found in default namespace.
Migration
Ready to leverage these new features? Let’s quickly cover how to migrate your existing setup to Pulumi Kubernetes Operator 2.0.
The Stack
custom resource continues to be the primary way to use the Pulumi Kubernetes Operator, and your existing
stack definitions should continue to work after you follow a simple migration procedure:
- Create a
ServiceAccount
and associatedClusterRoleBinding
as shown in the Security section. - If your stack uses the Pulumi Kubernetes Provider, create a
RoleBinding
or aClusterRoleBinding
to grant the necessary permissions, e.g. to the well-knownClusterRole
namedcluster-admin
. See “Using RBAC Authorization” for more information. - Edit your
Stack
manifest(s) to use the service account, by setting thespec.serviceAccountName
field. - Double-check that your
Stack
objects do not make use of cross-namespace references.
If you’re using a private Git repository as your program source, use the Stack’s secretRefs
field to reference Git credentials.
SSH-based access to the repository may require a custom image with your host’s public key(s) baked in. Contact us to learn more.
If you’re using Flux integration, there’s an additional step to be performed by your cluster administrator.
Please create a NetworkPolicy
to allow the workspace pods to access the Flux source controller. Here’s the manifest
to be applied:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-pulumi-fetch-flux-artifacts
namespace: flux-system
spec:
podSelector:
matchLabels:
app: source-controller
ingress:
- ports:
- protocol: TCP
port: http
from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: default
- podSelector:
matchLabels:
app.kubernetes.io/managed-by: pulumi-kubernetes-operator
app.kubernetes.io/name: pulumi
app.kubernetes.io/component: workspace
policyTypes:
- Ingress
Conclusion
The Pulumi Kubernetes Operator is a great way to deploy and manage cloud infrastructure from within your Kubernetes environment. With the 2.0 release, we’ve solved the scalability and isolation limitations that affected early adopters. You can get started with the Pulumi Kubernetes Operator today. Please join us on the #kubernetes channel of the Pulumi Community Slack workspace to give feedback, and enjoy!