Stateful Workloads with Kubernetes and Persistent Volumes

Question

Pulumi · Accepted Answer

In Kubernetes, stateful workloads are applications that require persistent storage to maintain state between sessions or pod restarts. Unlike stateless applications, where any pod can handle a request indistinguishably, stateful applications need to persist data. For handling stateful workloads, Kubernetes provides `StatefulSets`, which is an API object that manages the deployment and scaling of a set of Pods, and is designed to maintain a sticky identity for each of their Pods.

Persistent Volumes (PVs) are a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. They are a resource in the cluster just like nodes are cluster resources. Persistent Volume Claims (PVCs) are a request for storage by a user that can be fulfilled by a Persistent Volume.

A StatefulSet uses Persistent Volume Claims to provide stable and reliable storage for stateful applications. When you create a StatefulSet, you define a volume claim template, which provides a Persistent Volume for each pod in the StatefulSet.

Here is a basic Pulumi program written in Python which sets up a StatefulSet with an associated Persistent Volume Claim for each replica:

```python
import pulumi
from pulumi_kubernetes.apps.v1 import StatefulSet
from pulumi_kubernetes.core.v1 import PersistentVolumeClaim

# Define a Persistent Volume Claim (PVC) which will be used by the StatefulSet
pvc = PersistentVolumeClaim(
    "example-pvc",
    metadata={
        "name": "data",
    },
    spec={
        "accessModes": ["ReadWriteOnce"],  # The volume can be mounted as read-write by a single node
        "resources": {
            "requests": {
                "storage": "1Gi"  # Request 1 GiB of space
            }
        }
    })

# Define the StatefulSet with a referenced Persistent Volume Claim
stateful_set = StatefulSet(
    "example-statefulset",
    spec={
        "serviceName": "example",  # The service that governs this StatefulSet
        "replicas": 3,  # Specify the number of Pods in the StatefulSet
        "selector": {
            "matchLabels": {
                "app": "example"  # Labels to match the pods created by the StatefulSet
            }
        },
        "template": {
            "metadata": {
                "labels": {
                    "app": "example"
                }
            },
            "spec": {
                "containers": [{
                    "name": "nginx",
                    "image": "nginx:1.14.2",
                    "ports": [{
                        "containerPort": 80,
                        "name": "web"
                    }],
                    "volumeMounts": [{
                        "mountPath": "/usr/share/nginx/html",
                        "name": "data"  # Mounting the PVC to the Pod
                    }]
                }]
            }
        },
        "volumeClaimTemplates": [pvc.spec]  # Use the above PVC as a template for each Pod
    })

# Export the StatefulSet name
pulumi.export('stateful_set_name', stateful_set.metadata['name'])
```

In the above code:

- We first create a `PersistentVolumeClaim`, which defines the storage requirements we need. It's essentially a ticket to request storage resources from Kubernetes.

- Then, we create a `StatefulSet`. This object manages Pods based on the provided template and ensures that the specified number of Pods with a unique, persistent identity is running.

- Inside the StatefulSet, we define a `template` that Kubernetes uses to create new pods. The `volumeMounts` in the container specification tell each Pod to mount the persistent storage to a specified path inside the container.

When the above code is deployed with Pulumi, it will create a PVC resource and a StatefulSet resource in your Kubernetes cluster. Each Pod in the StatefulSet will mount the PVC to the path `/usr/share/nginx/html`, and you can write files there which will persist across Pod restarts.

Remember to have a Kubernetes cluster available and configured for Pulumi to use, and the necessary permissions to create, update, and delete resources in the cluster. If you're running this locally, Pulumi will use your kubectl configuration to communicate with the cluster. If you're using an automated CI/CD pipeline, you need to set up credentials accordingly.