Kubernetes Persistent Volumes for Distributed Training Checkpoints

Question

Pulumi · Accepted Answer

In a Kubernetes cluster, a PersistentVolume (PV) is a piece of storage that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PersistentVolumes are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV. This resource is used when you need persistent storage for your application that survives pod restarts or failures.

A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a Pod in that Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Similarly, PVCs can request specific size and access modes (e.g., they can be mounted once read/write or many times read-only).

To use Persistent Volumes for distributed training checkpoints in a Kubernetes cluster, you need to:

1. Create a PersistentVolume that represents the physical storage.
2. Create a PersistentVolumeClaim that a pod will use to request the physical storage.

Here's a simple program that demonstrates how to define a PersistentVolume and a PersistentVolumeClaim in Pulumi using the Kubernetes provider:

```python
import pulumi
import pulumi_kubernetes as kubernetes

# Define a PersistentVolume using a local path on the node.
# You can change the storage source to match your requirement (e.g., NFS, iSCSI, or cloud-specific storage).
persistent_volume = kubernetes.core.v1.PersistentVolume(
    "pv-checkpoints",
    metadata=kubernetes.meta.v1.ObjectMetaArgs(
        name="checkpoints-vol",  # Name of the PV
    ),
    spec=kubernetes.core.v1.PersistentVolumeSpecArgs(
        capacity={"storage": "10Gi"},  # Size of the volume
        access_modes=["ReadWriteOnce"],  # Access mode
        persistent_volume_reclaim_policy="Retain",  # Retain the PV after use
        host_path=kubernetes.core.v1.HostPathVolumeSourceArgs(
            path="/mnt/data",  # Path on the host node
        ),
    ),
)

# Define a PersistentVolumeClaim for the pod to use.
# The PVC will match the PV based on labels, access modes, and storage size.
persistent_volume_claim = kubernetes.core.v1.PersistentVolumeClaim(
    "pvc-checkpoints",
    metadata=kubernetes.meta.v1.ObjectMetaArgs(
        name="checkpoints-pvc",  # Name of the PVC
    ),
    spec=kubernetes.core.v1.PersistentVolumeClaimSpecArgs(
        access_modes=["ReadWriteOnce"],  # Must match the access modes of PV
        resources=kubernetes.core.v1.ResourceRequirementsArgs(
            requests={"storage": "10Gi"},  # Requested size of the volume
        ),
    ),
)

# Export the name of the volume and claim to access it later
pulumi.export("persistent_volume_name", persistent_volume.metadata["name"])
pulumi.export("persistent_volume_claim_name", persistent_volume_claim.metadata["name"])
```

In this program:

- We begin by importing the required Pulumi packages for Kubernetes.
- We create a PersistentVolume named `pv-checkpoints`.
  - The PV is set up to use the local storage on a node in the cluster under the `/mnt/data` path.
  - We specify the `capacity` to 10Gi and declare it with `ReadWriteOnce` access mode, which means the volume can be mounted as read-write by a single node. 
  - We set the `persistent_volume_reclaim_policy` to `Retain`, which tells Kubernetes to retain the underlying storage when the PV is released from a claim.
- We define a PersistentVolumeClaim named `pvc-checkpoints`.
  - The PVC requests the required access modes and storage size that match the PersistentVolume definition. So, when a pod is created using this PVC, Kubernetes will bind this claim to the PersistentVolume we created.
- Finally, we export the names of the PV and PVC so they can be easily referenced, such as when attaching the PVC to pods in your deployment that perform distributed training.

This example uses local storage; however, you should adjust the storage source (`host_path` in this case) in the `PersistentVolume` definition according to the actual storage solution you're using. Whether it's a network file system like NFS, a block storage like AWS EBS, or any other type of storage supported by Kubernetes, the corresponding configuration would need to be updated in the `PersistentVolume`'s `spec`.

You can use the `persistent_volume_claim_name` in your Pod spec to mount the volume and store checkpoints during the distributed training process. All pods using the same PVC will have access to the same persistent storage to retrieve or update training checkpoints.