Serving Large Language Models using AWS EFS

Question

Pulumi · Accepted Answer

Serving large language models (LLMs) typically involves both compute and storage considerations to handle models effectively. AWS Elastic File System (EFS) is a scalable, elastic, cloud-native file storage service that can be used to store the large datasets and model artifacts needed by the LLMs. Servers or compute instances can mount the EFS filesystem and access the data required for the models.

Below is a program written in Python using Pulumi to create an AWS EFS filesystem. This EFS filesystem will be available to plug into your machine learning infrastructure, where your model-serving compute instances can mount it.

The program does the following:

Creates a security group to control access to EFS.
Defines an EFS filesystem.
Creates a mount target for the EFS in each specified subnet, with the previously created security group attached.

Ensure that you have all necessary permissions and AWS credentials configured before running this program.

import pulumi
import pulumi_aws as aws

# Step 1: Create a security group for EFS
# This security group controls the access to the EFS filesystem from the VPC
efs_security_group = aws.ec2.SecurityGroup('efs-security-group',
    description='Security group for EFS',
    vpc_id='<your_vpc_id>',  # Replace with your VPC ID
    ingress=[
        # Allow NFS traffic from within the VPC.
        # You might want to narrow this down to a specific CIDR block
        # or source security group specific to your application servers.
        aws.ec2.SecurityGroupIngressArgs(
            description='NFS access from VPC',
            protocol='tcp',
            from_port=2049,  # NFS uses port 2049
            to_port=2049,
            cidr_blocks=['<your_vpc_cidr>'],  # Replace with your VPC CIDR block
        ),
    ],
    egress=[
        # Allow all outbound traffic.
        # You might want to restrict this according to your requirements.
        aws.ec2.SecurityGroupEgressArgs(
            description='Allow all outbound traffic',
            protocol='-1',  # -1 means all protocols
            from_port=0,
            to_port=0,
            cidr_blocks=['0.0.0.0/0'],
        ),
    ],
)

# Step 2: Define an EFS filesystem.
efs_filesystem = aws.efs.FileSystem('large-model-efs',
    lifecycle_policy=aws.efs.FileSystemLifecyclePolicyArgs(
        transition_to_ia="AFTER_30_DAYS",  # Move files to infrequent access after 30 days to save costs
    ),
    tags={
        'Name': 'LargeModelEFS',
    },
)

# Step 3: Creating mount targets for the EFS in multiple subnets
# Replace '<your_subnet_id>' with the actual subnet IDs where the EFS should be available.
# Each mount target places the EFS within a subnet of your VPC, making it available to instances within that subnet.
subnet_ids = ['<your_subnet_id_1>', '<your_subnet_id_2>']  # List your subnet IDs here
for subnet_id in subnet_ids:
    aws.efs.MountTarget(f"efs-mount-target-{subnet_id}",
        file_system_id=efs_filesystem.id,
        subnet_id=subnet_id,
        security_groups=[efs_security_group.id],  # Attach the security group we created earlier
    )

# To integrate this with your server infrastructure, you would now mount the EFS
# in your EC2 instances using the filesystem ID and mount targets.
# The instances should be configured to use this EFS as storage for your LLMs.

After creating the EFS filesystem with Pulumi, you would need to integrate it with your compute instances (like EC2 or SageMaker instances), where the large language models will be running. You can mount the EFS on these instances and use it to access the models or data required for serving predictions.

Always remember to replace placeholder values such as <your_vpc_id>, <your_vpc_cidr>, <your_subnet_id_1>, and <your_subnet_id_2> with actual values from your AWS setup. Be mindful of the security considerations, and adjust inbound and outbound traffic rules for the security group according to your actual requirements.