1. Logging Access to AI Datasets with AWS CloudTrail


    To log access to AI datasets with AWS CloudTrail, we'll set up a CloudTrail trail and configure it to record management events on AWS services, including any related to AI datasets. AWS CloudTrail is a service that provides a record of actions taken by a user, role, or AWS service. These actions are recorded in event logs that can be programmatically analyzed for security and governance.

    When an AWS service is accessed or a request is made to that service, CloudTrail records the event in a CloudTrail log that is stored in an Amazon S3 bucket. This setup is beneficial for auditing and analyzing access patterns because it gives you a clear understanding of who accessed what and when they did so.

    Here's what we need to do:

    1. Create an S3 bucket to store the log files.
    2. Configure a CloudTrail trail to capture management events and send them to the created S3 bucket.
    3. (Optional) Set up a CloudWatch Logs log group to also send trail events to CloudWatch Logs. This provides additional features for real-time analysis and longer-term storage of logs.

    Below is a Python program using the Pulumi AWS SDK that sets up AWS CloudTrail logging for AI dataset access. We'll use the aws.cloudtrail.Trail resource to create a trail that logs events across the whole AWS region. The logs will be stored in a newly created S3 bucket.

    import pulumi import pulumi_aws as aws # Create an Amazon S3 bucket to store CloudTrail logs s3_bucket = aws.s3.Bucket("cloudtrail-bucket") # An S3 bucket policy that allows CloudTrail to write logs to the bucket s3_bucket_policy = aws.s3.BucketPolicy("cloudtrail-bucket-policy", bucket=s3_bucket.id, policy=pulumi.Output.all(s3_bucket.arn).apply(lambda arn: f"""{{ "Version": "2012-10-17", "Statement": [{{ "Effect": "Allow", "Principal": {{ "Service": "cloudtrail.amazonaws.com" }}, "Action": "s3:GetBucketAcl", "Resource": "{arn}" }}, {{ "Effect": "Allow", "Principal": {{ "Service": "cloudtrail.amazonaws.com" }}, "Action": "s3:PutObject", "Resource": "{arn}/AWSLogs/*", "Condition": {{ "StringEquals": {{"s3:x-amz-acl": "bucket-owner-full-control"}} }} }}] }}""") # Configuring an AWS CloudTrail trail trail = aws.cloudtrail.Trail("trail", s3_bucket_name=s3_bucket.id, event_selectors=[aws.cloudtrail.TrailEventSelectorArgs( read_write_type="All", include_management_events=True, data_resources=[ aws.cloudtrail.TrailEventSelectorDataResourceArgs( type="AWS::S3::Object", values=[s3_bucket.arn.apply(lambda arn: f"{arn}/")], ), # You could also include specific data_resources related to AI dataset events. # For example: AWS::SageMaker::Model or AWS::SageMaker::Endpoint ], )], include_global_service_events=True, ) # Output the name of the bucket and the CloudTrail trail ARN pulumi.export("s3_bucket_name", s3_bucket.id) pulumi.export("cloudtrail_trail_arn", trail.arn)

    In this program, we are doing the following:

    • We create an Amazon S3 bucket to store CloudTrail log files.
    • We establish a bucket policy that grants CloudTrail the necessary permissions to write logs to our S3 bucket.
    • We configure an AWS CloudTrail to record all read and write management events within the AWS region of our Pulumi program. We set up event selectors to log data events for objects in our S3 bucket. This example captures all object-level operations (e.g. PutObject, GetObject) on the S3 bucket.

    Depending on the AI services you are using (such as Amazon SageMaker), you might need to specify additional data resources in the event selectors.

    Make sure you have the AWS CLI configured with the right permissions and Pulumi CLI installed to run this program. Save the script to a file (__main__.py) in a new directory and simply run pulumi up within the same directory to deploy these resources in your AWS account.