FSx for Lustre as an AI Training Data Repository

Question

Pulumi · Accepted Answer

To set up an Amazon FSx for Lustre file system as an AI training data repository, you'd use Pulumi to provision and configure the required resources. FSx for Lustre is a high-performance file system optimized for workloads such as machine learning, high-performance computing (HPC), video processing, and financial modeling.

We will provision an FSx for Lustre file system with the following steps:

1. Define the Amazon VPC and subnet where FSx for Lustre will be deployed. FSx for Lustre requires a VPC to be accessible.
2. Set up an FSx for Lustre file system in the subnet.
3. Optionally, you can configure Amazon S3 integration if you want your data to be accessible from an S3 bucket.

For the sake of simplicity, I'll provide a basic example that assumes you have an S3 bucket ready for integration and focuses on setting up the FSx for Lustre file system. In a real-world scenario, you would likely have additional considerations such as security group setup, IAM roles, and so on.

Below is a Pulumi program that demonstrates how to create an FSx for Lustre file system using the `aws.fsx.LustreFileSystem` resource:

```python
import pulumi
import pulumi_aws as aws

# Pre-existing entities assumed to be in place for this configuration
# The VPC where you wish to deploy your FSx for Lustre
vpc_id = "vpc-12345"
# The Subnet inside the VPC where FSx will live
subnet_id = "subnet-56789"

# Create an FSx for Lustre filesystem.
fsx_lustre_filesystem = aws.fsx.LustreFileSystem("fsxLustreFilesystem",
    subnet_ids=[subnet_id],  # Attach the Lustre filesystem to the VPC subnet
    storage_capacity=1200,  # Minimum storage capacity for Lustre in gibibytes (GiB) is 1200
    export_path=f"s3://{your_s3_bucket_name_for_import_export}/export",  # Path on S3 where exported files will be saved
    import_path=f"s3://{your_s3_bucket_name_for_import_export}",  # S3 bucket to be associated with the FSx file system
    deployment_type="SCRATCH_2",  # Deployment type (SCRATCH_1|SCRATCH_2|PERSISTENT_1) - SCRATCH_2 offers temporary storage with high IOPS
    per_unit_storage_throughput=200,  # Throughput per 1 tebibyte of storage in MB/s (options are 12, 40, 50, 100, 200)
    weekly_maintenance_start_time="2:03:00",  # Maintenance start time expressed in the format d:HH:MM
    automatic_backup_retention_days=7,  # Retention period for automatic backups
    daily_automatic_backup_start_time="04:00",  # Time of day when automatic backups are taken
)

# Export the FSx Lusture file system's DNS name and Mount name.
# The DNS name is useful to mount the file system from an EC2 instance,
# and the Mount Name is an unique identifier for the file system.
pulumi.export('fsx_lustre_dns_name', fsx_lustre_filesystem.dns_name)
pulumi.export('fsx_lustre_mount_name', fsx_lustre_filesystem.mount_name)
```

In this Pulumi Python program, the following actions occur:

- An FSx for Lustre file system named `fsxLustreFilesystem` is created.
- We specify the `subnet_ids` where the FSx file system should be deployed.
- A storage capacity of 1200 GiB is defined, which is the minimum necessary for FSx for Lustre.
- The `export_path` and `import_path` are set up for S3 integration. Replace `your_s3_bucket_name_for_import_export` with the actual bucket name you want to use.
- The `deployment_type` is set to `SCRATCH_2`, which is suited for temporary storage that requires high input/output operations per second (IOPS).
- `per_unit_storage_throughput` is defined according to the expected workload.
- The `weekly_maintenance_start_time` and `daily_automatic_backup_start_time` are specified to control when maintenance and backups occur.

Remember to replace the placeholder values such as `vpc-12345`, `subnet-56789`, and `your_s3_bucket_name_for_import_export` with your actual VPC, subnet, and S3 bucket information.

After deployment, the DNS name and mount name of the FSx file system are exported. These values are needed to connect to the file system, for instance, from an Amazon EC2 instance on which your AI training workloads would be executed.

Documentation links for more details:
- [`aws.fsx.LustreFileSystem`](https://www.pulumi.com/registry/packages/aws/api-docs/fsx/lustrefilesystem/) provides the main documentation to configure an FSx Lustre file system.
- [AWS FSx for Lustre](https://docs.aws.amazon.com/fsx/latest/LustreGuide/what-is.html) for more general information on FSx for Lustre and its use cases.