Storage Analysis for Cost Optimization with AWS S3 Inventory

Question

Pulumi · Accepted Answer

To analyze and optimize storage costs using AWS S3, we can leverage AWS S3 Inventory, which provides CSV, ORC, or Parquet output files that list your objects and their corresponding metadata on a daily or weekly basis. These inventory files can be used to identify opportunities for cost optimization, such as cleaning up old or unnecessary files or transitioning objects to more cost-efficient storage classes.

When you configure S3 Inventory for your bucket, you specify a destination bucket for the inventory data, the format of the inventory file, and various other options, including whether to include all versions of each object or just the current versions.

Below is a Pulumi program in Python that sets up an S3 bucket with S3 Inventory enabled, optimizing for storage cost analysis:

```python
import pulumi
import pulumi_aws as aws

# Create an AWS S3 bucket where the inventory will be stored
inventory_bucket = aws.s3.Bucket("inventoryBucket")

# Create an AWS S3 bucket that you want to analyze
source_bucket = aws.s3.Bucket("sourceBucket")

# Configure S3 Inventory for the source bucket
inventory_configuration = aws.s3.Inventory(
    "s3Inventory",
    bucket=source_bucket.id,
    destination={
        "bucket": {
            "format": "ORC", # Choose between 'CSV', 'ORC', or 'Parquet'.
            "bucket_arn": inventory_bucket.arn,
            "prefix": "inventory", # The prefix for the inventory lists
        },
    },
    included_object_versions="Current", # You can also specify 'All' versions
    schedule={
        "frequency": "Daily", # Choose between 'Daily' or 'Weekly'.
    },
    # Additional optional configurations:
    # - Enable output of file size, last modified date, storage class, and more.
    # - Filter the inventory with specific object prefixes or tags.
    # - Specify optional fields for the inventory.
)

# Export the names of the buckets
pulumi.export("inventory_bucket_name", inventory_bucket.id)
pulumi.export("source_bucket_name", source_bucket.id)
```
This program does the following:

1. Creates two S3 buckets: one to store the inventory (`inventoryBucket`) and one that we're going to analyze (`sourceBucket`).
2. Sets up S3 Inventory on the `sourceBucket` with the following configurations:
   - The inventory will be in ORC format, which is optimized for big data workloads. You can also choose CSV or Parquet if preferred.
   - Inventory files will be stored in the `inventoryBucket` with a prefix of "inventory".
   - Only the current versions of objects are included, but you could configure it to include all versions.
   - The inventory list will be generated daily, which can also be set to weekly if you prefer less frequent updates.

Remember that S3 bucket names need to be globally unique, so you will have to provide unique names for `inventoryBucket` and `sourceBucket`.

Finally, the program exports the names of the buckets so that you can easily find them in the AWS Management Console or when using the AWS CLI.

Please replace the bucket names and other settings with your actual desired configuration before running this program. By using Pulumi, you can easily modify and extend this program to include additional features such as bucket policies, lifecycle rules, and more, aligning with your cost optimization goals and compliance requirements.