1. FSx for Lustre as an AI Training Data Repository

    Python

    To set up an Amazon FSx for Lustre file system as an AI training data repository, you'd use Pulumi to provision and configure the required resources. FSx for Lustre is a high-performance file system optimized for workloads such as machine learning, high-performance computing (HPC), video processing, and financial modeling.

    We will provision an FSx for Lustre file system with the following steps:

    1. Define the Amazon VPC and subnet where FSx for Lustre will be deployed. FSx for Lustre requires a VPC to be accessible.
    2. Set up an FSx for Lustre file system in the subnet.
    3. Optionally, you can configure Amazon S3 integration if you want your data to be accessible from an S3 bucket.

    For the sake of simplicity, I'll provide a basic example that assumes you have an S3 bucket ready for integration and focuses on setting up the FSx for Lustre file system. In a real-world scenario, you would likely have additional considerations such as security group setup, IAM roles, and so on.

    Below is a Pulumi program that demonstrates how to create an FSx for Lustre file system using the aws.fsx.LustreFileSystem resource:

    import pulumi import pulumi_aws as aws # Pre-existing entities assumed to be in place for this configuration # The VPC where you wish to deploy your FSx for Lustre vpc_id = "vpc-12345" # The Subnet inside the VPC where FSx will live subnet_id = "subnet-56789" # Create an FSx for Lustre filesystem. fsx_lustre_filesystem = aws.fsx.LustreFileSystem("fsxLustreFilesystem", subnet_ids=[subnet_id], # Attach the Lustre filesystem to the VPC subnet storage_capacity=1200, # Minimum storage capacity for Lustre in gibibytes (GiB) is 1200 export_path=f"s3://{your_s3_bucket_name_for_import_export}/export", # Path on S3 where exported files will be saved import_path=f"s3://{your_s3_bucket_name_for_import_export}", # S3 bucket to be associated with the FSx file system deployment_type="SCRATCH_2", # Deployment type (SCRATCH_1|SCRATCH_2|PERSISTENT_1) - SCRATCH_2 offers temporary storage with high IOPS per_unit_storage_throughput=200, # Throughput per 1 tebibyte of storage in MB/s (options are 12, 40, 50, 100, 200) weekly_maintenance_start_time="2:03:00", # Maintenance start time expressed in the format d:HH:MM automatic_backup_retention_days=7, # Retention period for automatic backups daily_automatic_backup_start_time="04:00", # Time of day when automatic backups are taken ) # Export the FSx Lusture file system's DNS name and Mount name. # The DNS name is useful to mount the file system from an EC2 instance, # and the Mount Name is an unique identifier for the file system. pulumi.export('fsx_lustre_dns_name', fsx_lustre_filesystem.dns_name) pulumi.export('fsx_lustre_mount_name', fsx_lustre_filesystem.mount_name)

    In this Pulumi Python program, the following actions occur:

    • An FSx for Lustre file system named fsxLustreFilesystem is created.
    • We specify the subnet_ids where the FSx file system should be deployed.
    • A storage capacity of 1200 GiB is defined, which is the minimum necessary for FSx for Lustre.
    • The export_path and import_path are set up for S3 integration. Replace your_s3_bucket_name_for_import_export with the actual bucket name you want to use.
    • The deployment_type is set to SCRATCH_2, which is suited for temporary storage that requires high input/output operations per second (IOPS).
    • per_unit_storage_throughput is defined according to the expected workload.
    • The weekly_maintenance_start_time and daily_automatic_backup_start_time are specified to control when maintenance and backups occur.

    Remember to replace the placeholder values such as vpc-12345, subnet-56789, and your_s3_bucket_name_for_import_export with your actual VPC, subnet, and S3 bucket information.

    After deployment, the DNS name and mount name of the FSx file system are exported. These values are needed to connect to the file system, for instance, from an Amazon EC2 instance on which your AI training workloads would be executed.

    Documentation links for more details: