Persistent Storage for JupyterHub on Kubernetes

Question

Pulumi · Accepted Answer

To provision persistent storage for JupyterHub deployed in a Kubernetes cluster, you would need to define a `PersistentVolume` (PV) and a `PersistentVolumeClaim` (PVC). JupyterHub would use the PVC to request storage from the PV, which represents a piece of storage in the cluster.

Here is a brief overview of the resources that we will be using:

- **PersistentVolume (PV):** Represents storage provisioned by an administrator. It's a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.

- **PersistentVolumeClaim (PVC):** Represents a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). In the same way, PVCs can request specific sizes and access modes (e.g., they can be mounted once read/write or many times read-only).

- **StorageClass:** Provides a way for administrators to describe the "classes" of storage they offer. Different classes might map to quality-of-service levels or to backup policies, or to arbitrary policies determined by the cluster administrators. Kubernetes itself is unopinionated about what classes represent. This concept allows for PVCs to remain portable across clusters if needed.

Below is a Pulumi program that defines a `PersistentVolume`, a `PersistentVolumeClaim`, and an optional `StorageClass` for use with JupyterHub on Kubernetes.

```python
import pulumi
import pulumi_kubernetes as k8s

# Define a storage class (if necessary for your cluster).
# The StorageClass resource defines a class of storage within Kubernetes.
storage_class = k8s.storage.v1.StorageClass(
    "jupyterhub-storage-class",
    metadata=k8s.meta.v1.ObjectMetaArgs(
        name="jupyterhub-storage",  # Name of the StorageClass.
    ),
    provisioner="k8s.io/minikube-hostpath",  # This would be different if on cloud like aws or gcp.
    reclaim_policy="Retain",  # Retain the volume after the PVC is deleted or Delete to automatically delete.
    volume_binding_mode="Immediate",  # Bind immediately or wait for a pod.
    # For documentation: https://www.pulumi.com/registry/packages/kubernetes/api-docs/storage/v1/storageclass/
)

# Define a PersistentVolume which provides the actual storage resource.
# You'd configure access to the storage resource here. We'll fake it with a local path,
# but on a real K8s cluster, this might be an NFS mount, cloud volume, etc.
persistent_volume = k8s.core.v1.PersistentVolume(
    "jupyterhub-pv",
    metadata=k8s.meta.v1.ObjectMetaArgs(
        name="jupyterhub-pv",  # Name of the PV.
    ),
    spec=k8s.core.v1.PersistentVolumeSpecArgs(
        capacity={"storage": "10Gi"},  # Size of the volume.
        access_modes=["ReadWriteOnce"],  # ReadWriteOnce, ReadOnlyMany, or ReadWriteMany.
        persistent_volume_reclaim_policy="Retain",  # What happens to the PV when the PVC is deleted.
        storage_class_name=storage_class.metadata.name,  # Associate this PV with our storage class.
        host_path=k8s.core.v1.HostPathVolumeSourceArgs(
            path="/mnt/data"  # Path to the storage on the host. Replace with correct path or config for NFS, iSCSI, etc.
        ),
        # For documentation: https://www.pulumi.com/registry/packages/kubernetes/api-docs/core/v1/persistentvolume/
    ),
)

# Define a PersistentVolumeClaim that will use the PV above for JupyterHub.
# JupyterHub will use this PVC to request the PV's resources.
persistent_volume_claim = k8s.core.v1.PersistentVolumeClaim(
    "jupyterhub-pvc",
    metadata=k8s.meta.v1.ObjectMetaArgs(
        name="jupyterhub-pvc",  # Name of the PVC. This is referenced in JupyterHub's configuration.
    ),
    spec=k8s.core.v1.PersistentVolumeClaimSpecArgs(
        access_modes=["ReadWriteOnce"],  # Must match the PV's access modes.
        resources=k8s.core.v1.ResourceRequirementsArgs(
            requests={"storage": "10Gi"},  # Request up to the PV's capacity.
        ),
        storage_class_name=storage_class.metadata.name,  # Request a PV from the jupyterhub storage class.
        # For documentation: https://www.pulumi.com/registry/packages/kubernetes/api-docs/core/v1/persistentvolumeclaim/
    ),
)

# Export the name of the PVC to use it in JupyterHub deployment.
# The PVC name will need to be set in the JupyterHub Helm chart's configuration.
pulumi.export('jupyterhub_pvc_name', persistent_volume_claim.metadata.name)

# Note that JupyterHub should be deployed with a Helm chart or another method,
# and its configuration should reference `persistent_volume_claim.metadata.name`
# as the PVC to use for user storage. The deployment part is not shown here.
```

In this Pulumi program:
- We created a `StorageClass` resource that defines a class of storage within Kubernetes.
- We then defined a `PersistentVolume` (PV) that provides the actual storage resource. Typically, the `PersistentVolume` would be configured to connect to NFS, iSCSI, cloud storage, etc. For simplicity, in this example, we’re using a host path on a local filesystem which is only suitable for single node clusters like minikube or kind.
- A `PersistentVolumeClaim` (PVC) is specified as a storage request used by JupyterHub. We set the PVC to ask for the same storage size as the PV and to use the storage class we defined.
- Lastly, we export the PVC name so it can be referenced in the JupyterHub configuration, which is outside the scope of this code and typically managed as a Helm chart.

To properly deploy JupyterHub with persistent storage using this provisioning, you would reference the PVC within JupyterHub's values configuration file (if using Helm) or in the manifest for JupyterHub deployment. The `jupyterhub_pvc_name` export can be used to provide the needed PVC name for such a configuration.