Securing Machine Learning Artifacts on S3

Question

Pulumi · Accepted Answer

To secure your machine learning artifacts on Amazon S3, you will need to consider multiple aspects such as encryption, access control, and versioning. In this Pulumi program, we will create an S3 bucket configured with server-side encryption using AWS Key Management Service (KMS) for enhanced security. We will also enable versioning to keep a history of objects for recovery and secure our bucket using a bucket policy to restrict access.

Here is a general structure of the steps we'll take:
1. Create an S3 bucket to store your machine learning artifacts.
2. Enable server-side encryption on the S3 bucket using KMS.
3. Turn on versioning for the S3 bucket to keep track of and recover previous versions of your artifacts.
4. Apply a bucket policy to enforce fine-grained access control.

Below is the Pulumi program written in Python that accomplishes these goals:

```python
import pulumi
import pulumi_aws as aws

# Create a new KMS key for encrypting our S3 bucket
kms_key = aws.kms.Key("my-key",
                      description="KMS key for S3 machine learning artifacts",
                      deletion_window_in_days=10)

# Create a new S3 bucket
ml_artifacts_bucket = aws.s3.Bucket("ml-artifacts-bucket",
    acl="private",  # Access to the bucket is private by default
    versioning=aws.s3.BucketVersioningArgs(
        enabled=True,  # Enable versioning for the artifacts
    ),
    server_side_encryption_configuration=aws.s3.BucketServerSideEncryptionConfigurationArgs(
        rule=aws.s3.BucketServerSideEncryptionConfigurationRuleArgs(
            apply_server_side_encryption_by_default=aws.s3.BucketServerSideEncryptionConfigurationRuleApplyServerSideEncryptionByDefaultArgs(
                sse_algorithm="aws:kms",
                kms_master_key_id=kms_key.id,
            ),
        ),
    ))

# Bucket policy to enforce the use of the KMS key for uploads to this bucket
bucket_policy = aws.s3.BucketPolicy("bucket-policy",
    bucket=ml_artifacts_bucket.id,
    policy=pulumi.Output.all(ml_artifacts_bucket.id, kms_key.arn).apply(lambda args: f"""{{
    "Version": "2012-10-17",
    "Statement": [
        {{
            "Sid": "DenyIncorrectEncryptionHeader",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::{args[0]}/*",
            "Condition": {{
                "StringNotEquals": {{
                    "s3:x-amz-server-side-encryption": "aws:kms",
                    "s3:x-amz-server-side-encryption-aws-kms-key-id": "{args[1]}"
                }}
            }}
        }},
        {{
            "Sid": "DenyUnEncryptedObjectUploads",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::{args[0]}/*",
            "Condition": {{
                "Null": {{
                    "s3:x-amz-server-side-encryption": "true"
                }}
            }}
        }}
    ]
}}"""))

# Export the bucket name and KMS key ARN for reference
pulumi.export('bucket_name', ml_artifacts_bucket.id)
pulumi.export('kms_key_arn', kms_key.arn)
```

In this program, we first create a KMS key to use for encrypting our S3 bucket. This adds an additional layer of security for your machine learning artifacts compared to default S3 managed keys.

Next, we create the S3 bucket with server-side encryption enabled using the KMS key. We also enable versioning to ensure we can access previous states of our artifacts if needed.

Finally, we add a bucket policy that denies any PUT operation on the bucket that does not include the proper server-side encryption header using our KMS key. This policy ensures all uploaded content is encrypted using our KMS key and that unencrypted uploads are not allowed.

By exporting the bucket name and the KMS key ARN, we can reference them easily outside of Pulumi, for instance in CI/CD pipelines or other automated systems.

This Pulumi program needs to be run using Pulumi CLI. After configuring your AWS credentials, you initialize a new Pulumi stack and apply this program. Once applied, your machine learning artifacts will be stored securely on AWS S3.