Sensitive Data Identification with AWS Macie2 for AI Compliance.
PythonAWS Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect sensitive data in AWS. As an AI-driven service, Macie automates the discovery of sensitive data at scale and lowers the cost of protecting your data.
To achieve sensitive data identification with AWS Macie for AI compliance using Pulumi, you'll need to set up several resources:
- Macie Account – A resource that represents your Amazon Macie account.
- Custom Data Identifier – Defines the criteria (using regex patterns, keywords, etc.) that Macie uses to identify sensitive data.
- Classification Job – A one-time or scheduled task that Macie performs to scan for sensitive data in specified S3 buckets.
The following Pulumi Python program sets up Macie for sensitive data identification:
import pulumi import pulumi_aws as aws # Initialize a Macie account macie_account = aws.macie2.Account("macie-account", status="ENABLED", finding_publishing_frequency="FIFTEEN_MINUTES", ) # Create a custom data identifier to detect sensitive data based on a regex pattern custom_data_identifier = aws.macie2.CustomDataIdentifier("custom-data-identifier", name="SensitiveDataIdentifier", description="Detect sensitive data like credit card numbers", regex="(\\d{4}-){3}\\d{4}", # An example regex that looks like a credit card format keywords=["confidential", "SSN"], # List of keywords to watch for maximum_match_distance=50, # The maximum number of characters between occurrences of regex matches and keywords ) # Create a classification job to run Macie on the given S3 bucket s3_bucket = aws.s3.Bucket.get("example-bucket", "example-bucket-name") classification_job = aws.macie2.ClassificationJob("classification-job", job_type="ONE_TIME", # Can be "ONE_TIME" or "SCHEDULED" custom_data_identifier_ids=[custom_data_identifier.id], s3_job_definition=aws.macie2.ClassificationJobS3JobDefinitionArgs( bucket_definitions=[ aws.macie2.ClassificationJobS3JobDefinitionBucketDefinitionsArgs( # This example assumes you have S3 buckets already. Else, create them using Pulumi. account_id= macie_account.id, buckets=[s3_bucket.id] ) ] ), initial_run=True, job_status="RUNNING", ) pulumi.export("macie_account_id", macie_account.id) pulumi.export("custom_data_identifier_id", custom_data_identifier.id) pulumi.export("classification_job_id", classification_job.id)
In this program, we start by initializing an AWS Macie account using the
Macie2.Account
resource from Pulumi's AWS SDK. We enable the account and set the frequency at which findings are to be published.Then, we create a
CustomDataIdentifier
resource that uses a regex pattern to identify what looks like a credit card number and includes certain keywords as additional criteria for detecting sensitive data.After that, we retrieve an existing AWS S3 bucket (assuming one has already been provisioned) and establish a
ClassificationJob
. The job uses the custom data identifier to scan the S3 bucket(s) for sensitive data. In this example, the job is configured to run one time (ONE_TIME
). If you wish for the job to be scheduled, you may change thejob_type
toSCHEDULED
and configure the schedule.Finally, we export the IDs of the created resources so that their status can be tracked and referenced outside of Pulumi, if necessary.
This program needs to be run within an environment that has the AWS and Pulumi CLI set up and configured with the necessary credentials and permissions to manage these resources. Once executed, this Pulumi script will automate the setup of AWS Macie, tailored for sensitive data identification, thus assisting in maintaining AI compliance.