High-Performance Datasets Access for LLMs via Kubernetes CSI

Question

Pulumi · Accepted Answer

In order to enable high-performance access to datasets for large language models (LLMs) using Kubernetes, we can leverage the Container Storage Interface (CSI). The Kubernetes CSI is a standard for exposing arbitrary block and file storage systems to containerized workloads on Kubernetes. It enables storage providers to develop a plugin once and have it work across a number of container orchestration systems.

Here is how we can achieve it:

1. **CSIDriver**: We'll set up a `CSIDriver` object which will allow Kubernetes to communicate with the respective CSI driver. This driver is responsible for the provisioning, deprovisioning, attaching, and detaching of the storage used by your workloads.

2. **CSIStorageCapacity**: This object can be used to represent the storage capacity of a volume provided by a CSI driver. It allows the scheduler to make more informed decisions during the pod placement.

3. **StorageClass**: This is used to describe the "classes" of storage offered. Different classes might map to quality-of-service levels or to backup policies, or to arbitrary policies determined by the cluster administrators. Storage classes can also specify volume binding modes, which might be important for high-performance workloads.

4. **PersistentVolume (PV)** and **PersistentVolumeClaim (PVC)**: PVs and PVCs are part of the storage API. A PVC is a request for storage by a user, and a PV is a volume plugged into Kubernetes — backed by CSI when the storage provisioner is a CSI driver.

5. **CSINode**: This Kubernetes resource is updated by the Kubelet and contains information about all the CSI drivers deployed on a node.

With Pulumi's Kubernetes SDK, you can set up this infrastructure as code, allowing you to version, share, and reuse your configuration.

Below is a basic Pulumi program in Python that outlines the steps to set up CSI for Kubernetes. Please note that an actual implementation would require a storage system that supports a CSI driver and knowledge of that specific driver's configuration options.

```python
import pulumi
import pulumi_kubernetes as k8s

# A Kubernetes CSI driver is often provided by a separate entity such as a storage vendor.
# We'll set up a CSIDriver resource assuming the driver is already installed in the cluster.
csidriver = k8s.storage.v1.CSIDriver(
    "csi-driver",
    metadata=k8s.meta.v1.ObjectMetaArgs(
        name="my-csi-driver",
    ),
    spec=k8s.storage.v1.CSIDriverSpecArgs(
        attach_required=True,  # Requires attachment of volumes to nodes
        pod_info_on_mount=True,  # Passes pod information to CSI volume driver
        volume_lifecycle_modes=["Persistent"],  # Volume mode supported by the driver
    )
)

# Setup a StorageClass specifically for high performance requirements of LLMs.
storage_class = k8s.storage.v1.StorageClass(
    "high-performance-storage-class",
    metadata=k8s.meta.v1.ObjectMetaArgs(
        name="high-performance",
    ),
    provisioner="my-csi-driver",  # Name of the provisioner must match the installed CSI driver
    parameters={"someParameter": "someValue"},  # CSI driver specific parameters
    allow_volume_expansion=True,
    volume_binding_mode="WaitForFirstConsumer",  # Important for high-performance workloads
)

# Create a PersistentVolumeClaim that uses the high-performance StorageClass
pvc = k8s.core.v1.PersistentVolumeClaim(
    "llm-dataset-pvc",
    metadata=k8s.meta.v1.ObjectMetaArgs(
        name="llm-dataset",
    ),
    spec=k8s.core.v1.PersistentVolumeClaimSpecArgs(
        access_modes=["ReadWriteOnce"],  # The volume can be mounted as read-write by a single node
        storage_class_name=storage_class.metadata.name,
        resources=k8s.core.v1.ResourceRequirementsArgs(
            requests={"storage": "100Gi"},  # Requested size of the volume
        ),
    )
)

# Export the name of the storage class and the PersistentVolumeClaim.
pulumi.export('storage_class_name', storage_class.metadata.name)
pulumi.export('pvc_name', pvc.metadata.name)
```

In this program:
- `CSIDriver` object configuration is basic; in a real scenario, you may need to adjust properties according to the chosen driver's documentation.
- `StorageClass` is configured with a `WaitForFirstConsumer` volume binding mode. This delays the binding and provisioning of a PersistentVolume until a pod using the PVC is created, which is more suitable for high-performance storage.
- A `PersistentVolumeClaim` is created, referencing the storage class, asserting the high-performance requirements with `ReadWriteOnce` access mode and the size request suitable for the LLM datasets.

After setting up the above resources in your cluster, your LLM applications can claim the necessary high-performance storage by referencing the appropriate PVCs. The CSI driver will ensure that the storage is correctly provisioned and made available to your workloads.