Sharing ML Datasets Across Accounts with AWS RAM
PythonSharing machine learning datasets across different AWS accounts can be achieved through Pulumi with the help of AWS Resource Access Manager (RAM). AWS RAM is a service that helps you share your resources with any AWS account or within your AWS Organization. It simplifies the process of sharing across accounts while maintaining security and governance.
The general process of sharing ML datasets across accounts using AWS RAM involves:
- Creating a Resource Share in the AWS RAM to share resources like S3 buckets where the datasets are stored.
- Associating the Resource Share with the AWS account(s) that you want to share the datasets with.
- Accepting the resource share in the member account(s).
Below is a Pulumi program that demonstrates these steps. For this example, we will assume that you already have an S3 bucket with your ML datasets and you want to share it:
import pulumi import pulumi_aws as aws # Create a RAM Resource Share. ml_datasets_share = aws.ram.ResourceShare("mlDatasetsShare", allow_external_principals=True, # Allow sharing with accounts outside your organization. tags={"Name": "MLDatasetsShare"} ) # Retrieve the already existing S3 bucket ARN where the ML datasets are stored. # Note: Replace `bucket_name` with the actual bucket name. ml_datasets_bucket_arn = aws.s3.Bucket.get("existing-datasets-bucket", "bucket_name").arn # Associate the S3 bucket with the RAM Resource Share. bucket_association = aws.ram.ResourceAssociation("bucketAssociation", resource_arn=ml_datasets_bucket_arn, resource_share_arn=ml_datasets_share.id ) # Share the resource with another AWS account. # Note: Replace `account_id` with the AWS account ID you want to share the dataset with. principal_association = aws.ram.PrincipalAssociation("principalAssociation", principal="account_id", # AWS account ID. resource_share_arn=ml_datasets_share.id ) # To accept the shared resource, the other AWS account would use AWS RAM console or APIs # to accept the resource share. # Output the Resource Share ARN, which will be needed by the receiving account to accept the share. pulumi.export("ml_datasets_share_arn", ml_datasets_share.id)
Here's what each part of the program does:
-
ResourceShare: Starting with the
aws.ram.ResourceShare
resource, we're creating a Resource Share that will contain the AWS resources (like S3 buckets) we want to share with another account. Theallow_external_principals
argument allows accounts outside your organization to be associated with this share. -
Bucket.get: We are retrieving the ARN (Amazon Resource Name) of an existing S3 bucket where the ML datasets are stored. You'd replace the placeholder
bucket_name
with the actual bucket name. This S3 bucket will be the resource we want to share. -
ResourceAssociation: We associate the S3 bucket with our RAM Resource Share by creating a
aws.ram.ResourceAssociation
resource. We link the S3 bucket ARN and the Resource Share's ARN. -
PrincipalAssociation: With
aws.ram.PrincipalAssociation
, we specify which AWS account(s) we want to share the ML datasets with by providing their AWS account ID(s). -
Export: Finally, we use
pulumi.export
to output the ARN of the Resource Share. The receiving AWS account needs this ARN to accept the shared resources.
Please note that only the resource owner should execute the code above. To complete the sharing process, the member account should use the AWS Management Console or AWS CLI to accept the shared resources using the provided Resource Share ARN from the output of this program.