Analyzing AI Workload Expenses with AWS CUR

Question

Pulumi · Accepted Answer

To analyze AI workload expenses, one efficient approach is to use the AWS Cost and Usage Report (CUR), which provides comprehensive data about your AWS costs and usage. AWS CUR allows you to dive into the details of your AWS expenses, giving you rich insights that can help optimize spending.

Here's a high-level overview of the steps we'll take in the program using Pulumi with Python to set up an infrastructure for analyzing AI workload expenses using AWS CUR:

1. Define a Report Definition resource using AWS CUR to specify the data about AWS usage and costs you want to collect.
2. Configure the necessary S3 bucket to store the report data.
3. Set up permissions for AWS to write the CUR data to your S3 bucket.

The following Pulumi Python program sets up the AWS CUR for analyzing AI workload expenses:

```python
import pulumi
import pulumi_aws as aws

# Define the S3 bucket where we'd like our usage reports to be delivered.
report_bucket = aws.s3.Bucket("report-bucket",
    acl="private"
)

# Create a policy that allows AWS Cost and Usage Reports to write to the bucket.
policy_document = pulumi.Output.all(report_bucket.arn).apply(lambda args: f"""
{{
  "Version": "2012-10-17",
  "Statement": [
    {{
      "Effect": "Allow",
      "Principal": {{
        "Service": "cur.amazonaws.com"
      }},
      "Action": "s3:PutObject",
      "Resource": "{args[0]}/*"
    }}
  ]
}}
""")

policy = aws.s3.BucketPolicy("bucket-policy",
    bucket=report_bucket.id,
    policy=policy_document
)

# Define an AWS Cost and Usage Report Definition.
report_definition = aws.cur.ReportDefinition("report-definition",
    report_name="ai_workload_usage_report",
    time_unit="HOURLY",
    format="textORcsv",
    compression="GZIP",
    additional_schema_elements=["RESOURCES"],
    s3_bucket=report_bucket.id,
    s3_prefix="reports",
    s3_region="us-east-1",
    additional_artifacts=["REDSHIFT", "QUICKSIGHT"],
    refresh_closed_reports=True,
    report_versioning="CREATE_NEW_REPORT"
)

# Export the S3 bucket name where the reports will be stored
pulumi.export('report_bucket_name', report_bucket.id)
```

Now, let's break down the code:

- We create an S3 bucket named `report_bucket` which will be used to store the CUR data.
- We then create a bucket policy `policy_document` that explicitly allows the AWS Cost and Usage Report service to put files into our bucket.
- After that, we attach this policy to our bucket using the `BucketPolicy` resource.
- Next, we set up the `ReportDefinition`. We specify a `report_name` which can be identified within AWS to find your report, `time_unit` set to "HOURLY" for detailed granularity, and the `format` which could be either "text" or "csv". The `compression` option is set to "GZIP", and `additional_schema_elements` specifies the level of detail in the report which in this case is set to include "RESOURCES", which is useful for analyzing specific services usage.
- We define the S3 bucket details for the report to be written into, specifying the `s3_bucket`, `s3_prefix` to organize the reports within the bucket, and `s3_region` which should match the region of the S3 bucket.
- Additional artifacts are included as per the requirement for analysis in "REDSHIFT" and "QUICKSIGHT", making the data compatible with these additional AWS analysis services.
- We enable `refresh_closed_reports` to let AWS refresh the data when the billing period is closed, and we set `report_versioning` to "CREATE_NEW_REPORT" to get a new report for each period.

Finally, we export the S3 bucket name to make it accessible and easy to reference.

This just sets up the COST and Usage Report; to fully analyze the data, you may need to set up additional resources or perform analysis within an AWS service like QuickSight or using another analytics tool that can consume the CUR data.