Kubernetes MongoDB for Stateful AI Workloads
PythonCreating a Kubernetes MongoDB deployment suitable for stateful AI workloads involves creating a
StatefulSet
for MongoDB that ensures data persistence, introducing the concept of stable, unique network identifiers and persistent storage.Here's how we accommodate these needs:
- StatefulSet: Unlike
Deployments
,StatefulSet
is the Kubernetes workload API object used to manage stateful applications with unique, persistent identities and stable hostnames. - PersistentVolume (PV) and PersistentVolumeClaim (PVC): These objects store data that your applications can use without tying the lifecycle of the data to the lifecycle of the Kubernetes pods.
- Headless Service: It's used to control the network domain for the
StatefulSet
. Pods get DNS entries in the service domain as they are spun up, allowing for stable network identification. - StorageClass: This is created if you need to define different classes of storage, which could be backed by different types of storage systems, policies, etc.
Below is a Pulumi program written in Python. It demonstrates how to deploy MongoDB on a Kubernetes cluster using these Kubernetes primitives:
import pulumi import pulumi_kubernetes as k8s # The following MongoDB specification assumes that you have a Kubernetes cluster up and running. # Define a headless service for MongoDB to control the domain of the StatefulSet mongo_service = k8s.core.v1.Service("mongo-service", spec=k8s.core.v1.ServiceSpecArgs( cluster_ip="None", # For a headless service, you set ClusterIP to 'None' ports=[k8s.core.v1.ServicePortArgs( port=27017, # MongoDB port )], selector={ "app": "mongo", # This should match the selector of the StatefulSet }, )) # Define a StatefulSet for MongoDB mongo_statefulset = k8s.apps.v1.StatefulSet("mongo-statefulset", spec=k8s.apps.v1.StatefulSetSpecArgs( selector=k8s.meta.v1.LabelSelectorArgs( match_labels={ "app": "mongo", }, ), service_name=mongo_service.metadata.name, replicas=3, # Considering a production scenario, you can manage replicas as per requirement template=k8s.core.v1.PodTemplateSpecArgs( metadata=k8s.meta.v1.ObjectMetaArgs( labels={ "app": "mongo", }, ), spec=k8s.core.v1.PodSpecArgs( containers=[ k8s.core.v1.ContainerArgs( name="mongo", image="mongo", # Use the official MongoDB image args=["--replSet", "rs0", "--bind_ip", "0.0.0.0"], # Setup replication and bind to all interfaces ports=[k8s.core.v1.ContainerPortArgs( container_port=27017, )], volume_mounts=[k8s.core.v1.VolumeMountArgs( name="mongo-persistent-storage", # This name should match a volume claim in volumeClaimTemplates mount_path="/data/db", )], ), ], ), ), volume_claim_templates=[ k8s.core.v1.PersistentVolumeClaimArgs( metadata=k8s.meta.v1.ObjectMetaArgs( name="mongo-persistent-storage", ), spec=k8s.core.v1.PersistentVolumeClaimSpecArgs( access_modes=["ReadWriteOnce"], # Depending on your provider, you can also use ReadWriteMany resources=k8s.core.v1.ResourceRequirementsArgs( requests={"storage": "1Gi"}, # Request storage space as needed ), ) ), ], )) pulumi.export('mongo_service', mongo_service.metadata.name) pulumi.export('mongo_statefulset', mongo_statefulset.metadata.name)
In the program above:
- A
Service
namedmongo-service
is created which is a headless service to manage the domain of the StatefulSet. - A
StatefulSet
namedmongo-statefulset
is created which has three replicas of MongoDB pods, with the official MongoDB image. It attaches persistent storage to each pod for their databases at/data/db
, and configures a simple replica set namedrs0
. - The
volume_claim_templates
field provides stable storage using PersistentVolumes provisioned by a PersistentVolumeProvider.
This program can be adjusted with your custom specifications like the storage size, MongoDB version, etc. Once you run this Pulumi program, you'll have a MongoDB StatefulSet orchestrated by Kubernetes that can be used for stateful AI workloads, with data persistence ensured.
- StatefulSet: Unlike