How do I deploy large language models on high memory EC2 instances?
Deploying large language models (LLMs) involves using high memory instances because model inference can be memory intensive. In this script, we will provision an EC2 instance on AWS with enough memory to host a large language model.
The key components of the script include:
- AWS provider configuration: To interact with AWS resources.
- EC2 instance: A high memory instance type suitable for hosting LLMs.
- Security group: To control the traffic to the EC2 instance.
- Output: Exporting the hostname and instance ID for reference.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
const llmSg = new aws.ec2.SecurityGroup("llm_sg", {
namePrefix: "llm-sg-",
ingress: [{
description: "Allow SSH",
fromPort: 22,
toPort: 22,
protocol: "tcp",
cidrBlocks: ["0.0.0.0/0"],
}],
egress: [{
fromPort: 0,
toPort: 0,
protocol: "-1",
cidrBlocks: ["0.0.0.0/0"],
}],
});
const llmHost = new aws.ec2.Instance("llm_host", {
ami: "ami-0c55b159cbfafe1f0",
instanceType: aws.ec2.InstanceType.R5_12XLarge,
vpcSecurityGroupIds: [llmSg.id],
keyName: "my-key-pair",
tags: {
Name: "LLM Host",
},
});
export const instanceId = llmHost.id;
export const instancePublicDns = llmHost.publicDns;
Summary
In this example, we set up an AWS instance specifically for hosting large language models. We’ve chosen an instance type with a high memory capacity (r5.12xlarge
) and configured a security group to allow SSH access. The instance ID and public DNS are exported for easy reference.
Deploy this code
Want to deploy this code? Sign up for a free Pulumi account to deploy in a few clicks.
Sign upNew to Pulumi?
Want to deploy this code? Sign up with Pulumi to deploy in a few clicks.
Sign upThank you for your feedback!
If you have a question about how to use Pulumi, reach out in Community Slack.
Open an issue on GitHub to report a problem or suggest an improvement.