1. Kubernetes MongoDB for Stateful AI Workloads


    Creating a Kubernetes MongoDB deployment suitable for stateful AI workloads involves creating a StatefulSet for MongoDB that ensures data persistence, introducing the concept of stable, unique network identifiers and persistent storage.

    Here's how we accommodate these needs:

    1. StatefulSet: Unlike Deployments, StatefulSet is the Kubernetes workload API object used to manage stateful applications with unique, persistent identities and stable hostnames.
    2. PersistentVolume (PV) and PersistentVolumeClaim (PVC): These objects store data that your applications can use without tying the lifecycle of the data to the lifecycle of the Kubernetes pods.
    3. Headless Service: It's used to control the network domain for the StatefulSet. Pods get DNS entries in the service domain as they are spun up, allowing for stable network identification.
    4. StorageClass: This is created if you need to define different classes of storage, which could be backed by different types of storage systems, policies, etc.

    Below is a Pulumi program written in Python. It demonstrates how to deploy MongoDB on a Kubernetes cluster using these Kubernetes primitives:

    import pulumi import pulumi_kubernetes as k8s # The following MongoDB specification assumes that you have a Kubernetes cluster up and running. # Define a headless service for MongoDB to control the domain of the StatefulSet mongo_service = k8s.core.v1.Service("mongo-service", spec=k8s.core.v1.ServiceSpecArgs( cluster_ip="None", # For a headless service, you set ClusterIP to 'None' ports=[k8s.core.v1.ServicePortArgs( port=27017, # MongoDB port )], selector={ "app": "mongo", # This should match the selector of the StatefulSet }, )) # Define a StatefulSet for MongoDB mongo_statefulset = k8s.apps.v1.StatefulSet("mongo-statefulset", spec=k8s.apps.v1.StatefulSetSpecArgs( selector=k8s.meta.v1.LabelSelectorArgs( match_labels={ "app": "mongo", }, ), service_name=mongo_service.metadata.name, replicas=3, # Considering a production scenario, you can manage replicas as per requirement template=k8s.core.v1.PodTemplateSpecArgs( metadata=k8s.meta.v1.ObjectMetaArgs( labels={ "app": "mongo", }, ), spec=k8s.core.v1.PodSpecArgs( containers=[ k8s.core.v1.ContainerArgs( name="mongo", image="mongo", # Use the official MongoDB image args=["--replSet", "rs0", "--bind_ip", ""], # Setup replication and bind to all interfaces ports=[k8s.core.v1.ContainerPortArgs( container_port=27017, )], volume_mounts=[k8s.core.v1.VolumeMountArgs( name="mongo-persistent-storage", # This name should match a volume claim in volumeClaimTemplates mount_path="/data/db", )], ), ], ), ), volume_claim_templates=[ k8s.core.v1.PersistentVolumeClaimArgs( metadata=k8s.meta.v1.ObjectMetaArgs( name="mongo-persistent-storage", ), spec=k8s.core.v1.PersistentVolumeClaimSpecArgs( access_modes=["ReadWriteOnce"], # Depending on your provider, you can also use ReadWriteMany resources=k8s.core.v1.ResourceRequirementsArgs( requests={"storage": "1Gi"}, # Request storage space as needed ), ) ), ], )) pulumi.export('mongo_service', mongo_service.metadata.name) pulumi.export('mongo_statefulset', mongo_statefulset.metadata.name)

    In the program above:

    • A Service named mongo-service is created which is a headless service to manage the domain of the StatefulSet.
    • A StatefulSet named mongo-statefulset is created which has three replicas of MongoDB pods, with the official MongoDB image. It attaches persistent storage to each pod for their databases at /data/db, and configures a simple replica set named rs0.
    • The volume_claim_templates field provides stable storage using PersistentVolumes provisioned by a PersistentVolumeProvider.

    This program can be adjusted with your custom specifications like the storage size, MongoDB version, etc. Once you run this Pulumi program, you'll have a MongoDB StatefulSet orchestrated by Kubernetes that can be used for stateful AI workloads, with data persistence ensured.