MongoDB Atlas for Storing AI Training Datasets Securely
PythonTo store AI training datasets securely, you will need a robust, scalable, and secure database service, and MongoDB Atlas is a great fit for this. MongoDB Atlas is a fully-managed cloud database developed by the same people that build MongoDB, and it provides high levels of security and compliance to ensure that your sensitive AI training datasets are well-protected.
In the following Pulumi program, we will create a managed MongoDB Atlas Cluster, enable encryption at rest to secure the data, and configure auditing to track access and changes to your datasets.
Here are the steps we'll follow in the Pulumi program:
-
Setup the MongoDB Atlas Project: We'll start by creating a MongoDB Atlas Project, which serves as a container for your MongoDB deployments.
-
Provision a MongoDB Cluster: We’ll create a cluster in this project. The cluster is the core database where your data will reside.
-
Encryption at rest: To secure your data, we'll enable encryption at rest. This means that all your data on disk will be encrypted, offering an additional layer of security.
-
Auditing: We’ll configure auditing for the cluster to track any access or changes, which is a good practice, especially when dealing with sensitive data.
To use the following Pulumi program, you must have already installed Pulumi, set up the MongoDB Atlas provider, and configured your Pulumi CLI with the appropriate access keys for MongoDB Atlas. Here's the program:
import pulumi import pulumi_mongodbatlas as mongodbatlas # Create a new MongoDB Atlas Project project = mongodbatlas.Project("my-project", org_id="yourMongoDBOrgId", # Replace with your MongoDB organization id name="ai-datasets") # Create a MongoDB Cluster within the Project # This example creates a M10 cluster, which is suitable for development. You might need different specifications based on your workloads. cluster = mongodbatlas.Cluster("my-cluster", project_id=project.id, name="ai-cluster", cluster_type="REPLICASET", replication_specs=[{ "num_shards": 1, "regions_configs": [ { "region_name": "US_EAST_1", "priority": 7, "electable_nodes": 3, "read_only_nodes": 0, "analytics_nodes": 0 } ] }], provider_name="AWS", provider_disk_iops=100, provider_instance_size_name="M10", provider_region_name="US_EAST_1", disk_size_gb=10, backup_enabled=True, mongo_db_major_version="4.4") # Enable encryption at rest encryption_at_rest = mongodbatlas.EncryptionAtRest("encryption-at-rest", project_id=project.id, aws_kms_config={ "access_key_id": "yourAWSAccessKeyId", # Replace with your AWS access key ID "secret_access_key": "yourAWSSecretAccessKey", # Replace with your AWS secret key "customer_master_key_id": "yourAWSKMSKey" # Replace with your AWS KMS key for encryption }) # Configure auditing for the cluster auditing = mongodbatlas.Auditing("auditing", project_id=project.id, audit_filter='{ "atype": { "$in": [ "authenticate", "createCollection" ] }, "ts": { "$gte": { "$date": "2020-04-20T00:00:00.000Z" } } }') # Export the cluster ID and connection string pulumi.export("cluster_id", cluster.id) pulumi.export("connection_strings", cluster.connection_strings)
Ensure to replace the placeholders such as
yourMongoDBOrgId
,yourAWSAccessKeyId
,yourAWSSecretAccessKey
, andyourAWSKMSKey
with your actual MongoDB organization ID and AWS credentials. These values shouldn't be hardcoded for production use, but rather injected using a secrets manager.This program will set up a secure MongoDB Atlas environment suitable for storing AI training datasets. The cluster is created in the specified project, the data on disk is encrypted using your AWS KMS key, and an auditing configuration is established to track critical operations on your databases.
-