Kubernetes MongoDB for Stateful AI Workloads

Question

Pulumi · Accepted Answer

Creating a Kubernetes MongoDB deployment suitable for stateful AI workloads involves creating a `StatefulSet` for MongoDB that ensures data persistence, introducing the concept of stable, unique network identifiers and persistent storage.

Here's how we accommodate these needs:

1. **StatefulSet**: Unlike `Deployments`, `StatefulSet` is the Kubernetes workload API object used to manage stateful applications with unique, persistent identities and stable hostnames.
2. **PersistentVolume (PV)** and **PersistentVolumeClaim (PVC)**: These objects store data that your applications can use without tying the lifecycle of the data to the lifecycle of the Kubernetes pods.
3. **Headless Service**: It's used to control the network domain for the `StatefulSet`. Pods get DNS entries in the service domain as they are spun up, allowing for stable network identification.
4. **StorageClass**: This is created if you need to define different classes of storage, which could be backed by different types of storage systems, policies, etc.

Below is a Pulumi program written in Python. It demonstrates how to deploy MongoDB on a Kubernetes cluster using these Kubernetes primitives:

```python
import pulumi
import pulumi_kubernetes as k8s

# The following MongoDB specification assumes that you have a Kubernetes cluster up and running.

# Define a headless service for MongoDB to control the domain of the StatefulSet
mongo_service = k8s.core.v1.Service("mongo-service",
    spec=k8s.core.v1.ServiceSpecArgs(
        cluster_ip="None",  # For a headless service, you set ClusterIP to 'None'
        ports=[k8s.core.v1.ServicePortArgs(
            port=27017,  # MongoDB port
        )],
        selector={
            "app": "mongo",  # This should match the selector of the StatefulSet
        },
    ))

# Define a StatefulSet for MongoDB
mongo_statefulset = k8s.apps.v1.StatefulSet("mongo-statefulset",
    spec=k8s.apps.v1.StatefulSetSpecArgs(
        selector=k8s.meta.v1.LabelSelectorArgs(
            match_labels={
                "app": "mongo",
            },
        ),
        service_name=mongo_service.metadata.name,
        replicas=3,  # Considering a production scenario, you can manage replicas as per requirement
        template=k8s.core.v1.PodTemplateSpecArgs(
            metadata=k8s.meta.v1.ObjectMetaArgs(
                labels={
                    "app": "mongo",
                },
            ),
            spec=k8s.core.v1.PodSpecArgs(
                containers=[
                    k8s.core.v1.ContainerArgs(
                        name="mongo",
                        image="mongo",  # Use the official MongoDB image
                        args=["--replSet", "rs0", "--bind_ip", "0.0.0.0"],  # Setup replication and bind to all interfaces
                        ports=[k8s.core.v1.ContainerPortArgs(
                            container_port=27017,
                        )],
                        volume_mounts=[k8s.core.v1.VolumeMountArgs(
                            name="mongo-persistent-storage",  # This name should match a volume claim in volumeClaimTemplates
                            mount_path="/data/db",
                        )],
                    ),
                ],
            ),
        ),
        volume_claim_templates=[
            k8s.core.v1.PersistentVolumeClaimArgs(
                metadata=k8s.meta.v1.ObjectMetaArgs(
                    name="mongo-persistent-storage",
                ),
                spec=k8s.core.v1.PersistentVolumeClaimSpecArgs(
                    access_modes=["ReadWriteOnce"],  # Depending on your provider, you can also use ReadWriteMany
                    resources=k8s.core.v1.ResourceRequirementsArgs(
                        requests={"storage": "1Gi"},  # Request storage space as needed
                    ),
                )
            ),
        ],
    ))

pulumi.export('mongo_service', mongo_service.metadata.name)
pulumi.export('mongo_statefulset', mongo_statefulset.metadata.name)
```

In the program above:

- A `Service` named `mongo-service` is created which is a headless service to manage the domain of the StatefulSet.
- A `StatefulSet` named `mongo-statefulset` is created which has three replicas of MongoDB pods, with the official MongoDB image. It attaches persistent storage to each pod for their databases at `/data/db`, and configures a simple replica set named `rs0`.
- The `volume_claim_templates` field provides stable storage using PersistentVolumes provisioned by a PersistentVolumeProvider.

This program can be adjusted with your custom specifications like the storage size, MongoDB version, etc. Once you run this Pulumi program, you'll have a MongoDB StatefulSet orchestrated by Kubernetes that can be used for stateful AI workloads, with data persistence ensured.