Skip to main content
Pulumi logo

Run Open-Source LLMs on AWS EC2 with Ollama and Pulumi

Posted on · Updated on

TL;DR. Want to self-host an open-source LLM on AWS? Use a g4dn.xlarge ($0.526/hr on-demand, 16 GB GPU memory) for 7B/8B models, a g5.xlarge ($1.006/hr, 24 GB) for 13B–14B models, a g5.2xlarge ($1.212/hr, 24 GB) for 32B models, or a g6e.2xlarge ($2.242/hr, 48 GB) for 70B models. Deploy with the Pulumi program below and Ollama will run any model from its library: DeepSeek-R1, Llama 3, Qwen, or Mistral, with a one-line change.

This guide walks through that deployment end-to-end: a single Pulumi program that provisions a GPU-enabled EC2 instance, installs Ollama and Open WebUI via cloud-init, and exposes both a chat UI and an OpenAI-compatible API. The model is configurable, so you can swap DeepSeek-R1 for Llama 3.1, Qwen 2.5, or Mistral without touching the infrastructure code.

  1. Why run open-source LLMs on AWS EC2?
  2. Which models can I run, and which EC2 instance do I need?
  3. How much does this cost vs. hosted APIs?
  4. How do I deploy Ollama on AWS EC2 with Pulumi?
  5. How do I switch models?
  6. What are the next steps?

Why run open-source LLMs on AWS EC2?

Self-hosting an open-source LLM on AWS gives you three things hosted APIs can’t: data stays inside your VPC, per-token costs collapse to a flat hourly rate at high volume, and you can fine-tune or quantize models freely under permissive licenses. Ollama handles all three concerns from a single binary: it downloads, manages, and serves models behind an OpenAI-compatible API on port 11434.

The original version of this post focused on DeepSeek-R1 because it landed in late January 2025 and reset expectations for what an open-weight reasoning model could do. DeepSeek-R1 is still an excellent default (MIT-licensed, strong on math and coding, with distilled 1.5B–70B variants) but the same infrastructure runs Meta’s Llama 3, Alibaba’s Qwen, and Mistral equally well. Picking a model is now a config change, not an infrastructure decision.

A bar chart compares the performance of DeepSeek and OpenAI models across six benchmarks: AIME 2024, Codeforces, GPQA Diamond, MATH-500, MMLU, and SWE-bench Verified. The models evaluated include DeepSeek-R1, DeepSeek-R1-32B, DeepSeek-V3, OpenAI-o1-1217, and OpenAI-o1-mini, with accuracy or percentile scores represented as bars. DeepSeek-R1 (blue-striped) consistently ranks among the top performers, particularly excelling in MATH-500 (97.3%), MMLU (90.8%), and Codeforces (96.3%). The chart visually distinguishes each model using different colors and shading.

For reference, DeepSeek-R1 scores 79.8% on AIME 2024 (vs. 79.2% for OpenAI o1-1217), 97.3% on MATH-500 (vs. 96.4%), 96.3% on Codeforces (vs. 96.6%), and 49.2% on SWE-bench Verified (vs. 48.9%)—near-parity with closed frontier models on most reasoning benchmarks, with the same caveat that benchmark scores age fast.

Which models can I run, and which EC2 instance do I need?

The bottleneck for inference is GPU memory (VRAM): the model weights have to fit in it, plus a few GB of headroom for the KV cache. The table below maps the most common Ollama models to the smallest AWS GPU instance that comfortably runs them at 4-bit (Q4) quantization, which is what ollama pull <model> gives you by default.

Model familySizesApprox. VRAM (Q4)Smallest EC2 instanceOn-demand price (us-east-1)
DeepSeek-R1 (distill)1.5B / 7B / 8B1–6 GBg4dn.xlarge (T4, 16 GB)$0.526/hr
Llama 3.1 / Llama 3.28B~5 GBg4dn.xlarge (T4, 16 GB)$0.526/hr
Qwen 2.57B~5 GBg4dn.xlarge (T4, 16 GB)$0.526/hr
Mistral 7B / Mistral Nemo7B / 12B5–8 GBg4dn.xlarge (T4, 16 GB)$0.526/hr
DeepSeek-R1 (distill)14B~10 GBg5.xlarge (A10G, 24 GB)$1.006/hr
Llama 3.3 / DeepSeek-R132B / 32B distill~20 GBg5.2xlarge (A10G, 24 GB)$1.212/hr
Llama 3.1 / DeepSeek-R170B~42 GBg6e.2xlarge (L40S, 48 GB)$2.242/hr
DeepSeek-R1 (full)671B (MoE)400 GB+p5.48xlarge or multi-node$98.32/hr

For most workloads—internal tools, RAG backends, code assistants—a g4dn.xlarge running an 8B model is the right starting point. Move up only if quality is the bottleneck.

How much does this cost vs. hosted APIs?

A g4dn.xlarge running 24/7 costs ~$378/month on-demand, or ~$237/month with a 1-year reserved instance. Whether that’s cheaper than a hosted API depends entirely on your token volume.

Compare against hosted pricing as of April 2026 (input + output blended, rough numbers):

ProviderModelApprox. blended price
OpenAIGPT-4o-mini~$0.30 per 1M tokens
OpenAIGPT-4o~$5.00 per 1M tokens
AnthropicClaude Sonnet 4~$6.00 per 1M tokens
DeepSeek (hosted)DeepSeek-V3~$0.50 per 1M tokens

A g4dn.xlarge running Llama 3.1 8B sustains roughly 40–60 tokens/sec under single-user load, or about 100–155M tokens/month at 100% utilization. At that ceiling the effective rate is ~$2.40/M tokens—cheaper than GPT-4o or Claude, more expensive than GPT-4o-mini or DeepSeek’s own hosted API.

The takeaway: self-hosting wins on data residency, latency, and predictable cost at high utilization. Hosted APIs win below ~10M tokens/month or when you need frontier-class quality. Run the math against your actual token volume before committing.

How do I deploy Ollama on AWS EC2 with Pulumi?

Prerequisites

Before we start, make sure you have the following:

What is Ollama?

A black-and-white digital illustration of Ollama’s mascot, a stylized llama, wearing a “WORK!!” headband while intensely focused on paperwork. The mascot sits at a desk surrounded by towering stacks of documents, with scattered sheets and a coffee mug, conveying a sense of heavy workload and determination.

Ollama is an open-source runtime that downloads, manages, and serves large language models from a single binary. It runs on macOS, Linux, and Windows, supports GPU acceleration through CUDA and Metal, and exposes both a native HTTP API and an OpenAI-compatible API on port 11434. Most clients written for the OpenAI SDK work against an Ollama endpoint with only a base-URL change.

It supports the major open-weight families—DeepSeek-R1, Llama 3, Qwen, Mistral, Gemma, Phi—plus quantized and distilled variants for each. You pull a model by name and tag (ollama pull llama3.1:8b) and run it (ollama run llama3.1:8b); Ollama handles the rest.

Architecture

A diagram illustrating an AWS-based deployment with an EC2 GPU-enabled instance running Ollama and Open-WebUI within a public subnet of a VPC. The setup includes a Docker container connected to an open-source LLM served by Ollama (such as DeepSeek-R1:7B or any Ollama-supported model), represented by a blue box with an arrow pointing from the EC2 instance. The Ollama mascot is depicted as part of the architecture.

Create a new Pulumi project

First, scaffold a new Pulumi project. Run the following from an empty directory:

# Replace <language> with typescript, python, go, csharp, or yaml
pulumi new aws-<language>

Pick the language you are most comfortable with. The template installs the AWS provider and creates a working sample. You can delete the sample code—we’ll replace it with the snippets below.

Step 1: Create an instance role with S3 access

The EC2 instance needs to download NVIDIA drivers from a public AWS-managed S3 bucket. Create an IAM role with S3 read access and attach it to an instance profile:

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
import * as fs from "fs";

const role = new aws.iam.Role("deepSeekRole", {
    name: "deepseek-role",
    assumeRolePolicy: JSON.stringify({
        Version: "2012-10-17",
        Statement: [
            {
                Action: "sts:AssumeRole",
                Effect: "Allow",
                Principal: {
                    Service: "ec2.amazonaws.com",
                },
            },
        ],
    }),
});

new aws.iam.RolePolicyAttachment("deepSeekS3Policy", {
    policyArn: "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess",
    role: role.name,
});

const instanceProfile = new aws.iam.InstanceProfile("deepSeekProfile", {
    name: "deepseek-profile",
    role: role.name,
});
import pulumi
import pulumi_aws as aws
import json
import os

# IAM Role for EC2 instances
role = aws.iam.Role(
    "deepSeekRole",
    name="deepseek-role",
    assume_role_policy=json.dumps(
        {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Action": "sts:AssumeRole",
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "ec2.amazonaws.com",
                    },
                }
            ],
        }
    ),
)

# Attach S3 read-only policy to the IAM Role
iam_policy_attachment = aws.iam.RolePolicyAttachment(
    "deepSeekS3Policy",
    policy_arn="arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess",
    role=role.name,
)

# Instance Profile containing the IAM Role
instance_profile = aws.iam.InstanceProfile(
    "deepSeekProfile", name="deepseek-profile", role=role.name
)
package main

import (
	"encoding/json"
	"os"

	"github.com/pulumi/pulumi-aws/sdk/v6/go/aws/ec2"
	"github.com/pulumi/pulumi-aws/sdk/v6/go/aws/iam"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
		rolePolicy, err := json.Marshal(map[string]interface{}{
			"Version": "2012-10-17",
			"Statement": []map[string]interface{}{
				{
					"Action":    "sts:AssumeRole",
					"Effect":    "Allow",
					"Principal": map[string]interface{}{"Service": "ec2.amazonaws.com"},
				},
			},
		})
		if err != nil {
			return err
		}

		role, err := iam.NewRole(ctx, "deepSeekRole", &iam.RoleArgs{
			Name:             pulumi.String("deepseek-role"),
			AssumeRolePolicy: pulumi.String(rolePolicy),
		})
		if err != nil {
			return err
		}

		// Attach S3 read-only policy to the IAM Role
		_, err = iam.NewRolePolicyAttachment(ctx, "deepSeekS3Policy", &iam.RolePolicyAttachmentArgs{
			PolicyArn: pulumi.String("arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"),
			Role:      role.Name,
		})
		if err != nil {
			return err
		}

		// Instance Profile containing the IAM Role
		instanceProfile, err := iam.NewInstanceProfile(ctx, "deepSeekProfile", &iam.InstanceProfileArgs{
			Name: pulumi.String("deepseek-profile"),
			Role: role.Name,
		})
		if err != nil {
			return err
		}
		return nil
	})
}
using Pulumi;
using Pulumi.Aws.Ec2;
using Pulumi.Aws.Iam;
using System.Collections.Generic;
using System.IO;
using Pulumi.Aws.Ec2.Inputs;
using System.Text.Json;

return await Deployment.RunAsync(() =>
{
    var rolePolicy = JsonSerializer.Serialize(new Dictionary<string, object>
    {
    {
        ["Version"] = "2012-10-17",
        ["Statement"] = new[]
        {
            new Dictionary<string, object>
            {
                ["Action"] = "sts:AssumeRole",
                ["Effect"] = "Allow",
                ["Principal"] = new Dictionary<string, string>
                {
                    ["Service"] = "ec2.amazonaws.com"
                }
            }
        }
    });

    var role = new Role("deepSeekRole", new RoleArgs
    {
        Name = "deepseek-role",
        AssumeRolePolicy = rolePolicy
    });

    var rolePolicyAttachment = new RolePolicyAttachment("deepSeekS3Policy", new RolePolicyAttachmentArgs
    {
        PolicyArn = "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess",
        Role = role.Name
    });

    var instanceProfile = new InstanceProfile("deepSeekProfile", new InstanceProfileArgs
    {
        Name = "deepseek-profile",
        Role = role.Name
    });

    var vpc = new Vpc("deepSeekVpc", new VpcArgs
    {
        CidrBlock = "10.0.0.0/16",
        EnableDnsHostnames = true,
        EnableDnsSupport = true
    });

    var subnet = new Subnet("deepSeekSubnet", new SubnetArgs
    {
        VpcId = vpc.Id,
        CidrBlock = "10.0.48.0/20",
        AvailabilityZone = "eu-central-1a",
        MapPublicIpOnLaunch = true
    });

    var internetGateway = new InternetGateway("deepSeekInternetGateway", new InternetGatewayArgs
    {
        VpcId = vpc.Id
    });

    var routeTable = new RouteTable("deepSeekRouteTable", new RouteTableArgs
name: deepseek-ollama-yaml
description: DeepSeek Ollama AWS example
runtime: yaml

variables:
  publicKey:
    fn::readFile: ./deepseek.rsa
  userData:
    fn::readFile: ./cloud-init.yaml
  amiFilter: "amzn2-ami-hvm-*-x86_64-gp2"
  amiOwner: "137112412989"
  amiId:
    fn::invoke:
      function: aws:ec2:getAmi
      arguments:
        filters:
          - name: name
            values: ["${amiFilter}"]
        owners: ["${amiOwner}"]
        mostRecent: true
      return: id

resources:
  deepSeekRole:
    type: aws:iam:Role
    properties:
      name: deepseek-role
      assumeRolePolicy: |
        {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Action": "sts:AssumeRole",
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "ec2.amazonaws.com"
                    }
                }
            ]
        }

  deepSeekS3Policy:
    type: aws:iam:RolePolicyAttachment
    properties:
      role: ${deepSeekRole.name}
      policyArn: arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

  deepSeekProfile:
    type: aws:iam:InstanceProfile
    properties:
      name: deepseek-profile
      role: ${deepSeekRole.name}

Step 2: Create the network

Next, create the VPC, subnet, internet gateway, and route table. The security group opens ports 22 (SSH), 3000 (Open WebUI), and 11434 (Ollama API):

const vpc = new aws.ec2.Vpc("deepSeekVpc", {
    cidrBlock: "10.0.0.0/16",
    enableDnsHostnames: true,
    enableDnsSupport: true,
});

const subnet = new aws.ec2.Subnet("deepSeekSubnet", {
    vpcId: vpc.id,
    cidrBlock: "10.0.48.0/20",
    availabilityZone: pulumi.interpolate`${aws.getAvailabilityZones().then(it => it.names[0])}`,
    mapPublicIpOnLaunch: true,
});

const internetGateway = new aws.ec2.InternetGateway("deepSeekInternetGateway", {
    vpcId: vpc.id,
});

const routeTable = new aws.ec2.RouteTable("deepSeekRouteTable", {
    vpcId: vpc.id,
    routes: [
        {
            cidrBlock: "0.0.0.0/0",
            gatewayId: internetGateway.id,
        },
    ],
});

const routeTableAssociation = new aws.ec2.RouteTableAssociation("deepSeekRouteTableAssociation", {
    subnetId: subnet.id,
    routeTableId: routeTable.id,
});

const securityGroup = new aws.ec2.SecurityGroup("deepSeekSecurityGroup", {
    vpcId: vpc.id,
    egress: [
        {
            fromPort: 0,
            toPort: 0,
            protocol: "-1",
            cidrBlocks: ["0.0.0.0/0"],
        },
    ],
    ingress: [
        {
            fromPort: 22,
            toPort: 22,
            protocol: "tcp",
            cidrBlocks: ["0.0.0.0/0"],
        },
        {
            fromPort: 3000,
            toPort: 3000,
            protocol: "tcp",
            cidrBlocks: ["0.0.0.0/0"],
        },
        {
            fromPort: 11434,
            toPort: 11434,
            protocol: "tcp",
            cidrBlocks: ["0.0.0.0/0"],
        },
    ],
});

# Create a VPC
vpc = aws.ec2.Vpc(
    "deepSeekVpc",
    cidr_block="10.0.0.0/16",
    enable_dns_hostnames=True,
    enable_dns_support=True,
)

# Create a subnet
subnet = aws.ec2.Subnet(
    "deepSeekSubnet",
    vpc_id=vpc.id,
    cidr_block="10.0.48.0/20",
    availability_zone="eu-central-1a",
    map_public_ip_on_launch=True,
)

# Create an internet gateway
internet_gateway = aws.ec2.InternetGateway("deepSeekInternetGateway", vpc_id=vpc.id)

# Create a route table and route table association
route_table = aws.ec2.RouteTable(
    "deepSeekRouteTable",
    vpc_id=vpc.id,
    routes=[
        aws.ec2.RouteTableRouteArgs(
            cidr_block="0.0.0.0/0", gateway_id=internet_gateway.id
        )
    ],
)

route_table_association = aws.ec2.RouteTableAssociation(
    "deepSeekRouteTableAssociation", subnet_id=subnet.id, route_table_id=route_table.id
)

# Create a security group
security_group = aws.ec2.SecurityGroup(
    "deepSeekSecurityGroup",
    vpc_id=vpc.id,
    egress=[
        {
            "from_port": 0,
            "to_port": 0,
            "protocol": "-1",
            "cidr_blocks": ["0.0.0.0/0"],
        }
    ],
    ingress=[
        {
            "from_port": 22,
            "to_port": 22,
            "protocol": "tcp",
            "cidr_blocks": ["0.0.0.0/0"],
        },
        {
            "from_port": 3000,
            "to_port": 3000,
            "protocol": "tcp",
            "cidr_blocks": ["0.0.0.0/0"],
        },
        {
            "from_port": 11434,
            "to_port": 11434,
            "protocol": "tcp",
            "cidr_blocks": ["0.0.0.0/0"],
        },
    ],
)
package main

import (
	"encoding/json"
	"os"

	"github.com/pulumi/pulumi-aws/sdk/v6/go/aws/ec2"
	"github.com/pulumi/pulumi-aws/sdk/v6/go/aws/iam"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
                // omitted for brevity
		}

		// Create a VPC
		vpc, err := ec2.NewVpc(ctx, "deepSeekVpc", &ec2.VpcArgs{
			CidrBlock:          pulumi.String("10.0.0.0/16"),
			EnableDnsHostnames: pulumi.Bool(true),
			EnableDnsSupport:   pulumi.Bool(true),
		})
		if err != nil {
			return err
		}

		// Create a subnet
		subnet, err := ec2.NewSubnet(ctx, "deepSeekSubnet", &ec2.SubnetArgs{
			VpcId:               vpc.ID(),
			CidrBlock:           pulumi.String("10.0.48.0/20"),
			AvailabilityZone:    pulumi.String("eu-central-1a"),
			MapPublicIpOnLaunch: pulumi.Bool(true),
		})
		if err != nil {
			return err
		}

		// Create an internet gateway
		internetGateway, err := ec2.NewInternetGateway(ctx, "deepSeekInternetGateway", &ec2.InternetGatewayArgs{
			VpcId: vpc.ID(),
		})
		if err != nil {
			return err
		}

		// Create a route table and route table association
		routeTable, err := ec2.NewRouteTable(ctx, "deepSeekRouteTable", &ec2.RouteTableArgs{
			VpcId: vpc.ID(),
			Routes: ec2.RouteTableRouteArray{
				&ec2.RouteTableRouteArgs{
					CidrBlock: pulumi.String("0.0.0.0/0"),
					GatewayId: internetGateway.ID(),
				},
			},
		})
		if err != nil {
			return err
		}

		_, err = ec2.NewRouteTableAssociation(ctx, "deepSeekRouteTableAssociation", &ec2.RouteTableAssociationArgs{
			SubnetId:     subnet.ID(),
			RouteTableId: routeTable.ID(),
		})
		if err != nil {
			return err
		}

		// Create a security group
		securityGroup, err := ec2.NewSecurityGroup(ctx, "deepSeekSecurityGroup", &ec2.SecurityGroupArgs{
			VpcId: vpc.ID(),
			Egress: ec2.SecurityGroupEgressArray{
				&ec2.SecurityGroupEgressArgs{
					FromPort:   pulumi.Int(0),
					ToPort:     pulumi.Int(0),
					Protocol:   pulumi.String("-1"),
					CidrBlocks: pulumi.StringArray{pulumi.String("0.0.0.0/0")},
				},
			},
			Ingress: ec2.SecurityGroupIngressArray{
				&ec2.SecurityGroupIngressArgs{
					FromPort:   pulumi.Int(22),
					ToPort:     pulumi.Int(22),
					Protocol:   pulumi.String("tcp"),
					CidrBlocks: pulumi.StringArray{pulumi.String("0.0.0.0/0")},
				},
				&ec2.SecurityGroupIngressArgs{
					FromPort:   pulumi.Int(3000),
					ToPort:     pulumi.Int(3000),
					Protocol:   pulumi.String("tcp"),
					CidrBlocks: pulumi.StringArray{pulumi.String("0.0.0.0/0")},
				},
				&ec2.SecurityGroupIngressArgs{
					FromPort:   pulumi.Int(11434),
					ToPort:     pulumi.Int(11434),
					Protocol:   pulumi.String("tcp"),
					CidrBlocks: pulumi.StringArray{pulumi.String("0.0.0.0/0")},
				},
			},
		})
		if err != nil {
			return err
		}

		return nil
	})
}
using Pulumi;
using Pulumi.Aws.Ec2;
using Pulumi.Aws.Iam;
using System.Collections.Generic;
using System.IO;
using Pulumi.Aws.Ec2.Inputs;
using System.Text.Json;

return await Deployment.RunAsync(() =>
{
    var rolePolicy = JsonSerializer.Serialize(new Dictionary<string, object>
    {
            // omitted for brevity
    {
        VpcId = vpc.Id,
        Routes =
        {
            new RouteTableRouteArgs
            {
                CidrBlock = "0.0.0.0/0",
                GatewayId = internetGateway.Id
            }
        }
    });

    var routeTableAssociation = new RouteTableAssociation("deepSeekRouteTableAssociation", new RouteTableAssociationArgs
    {
        SubnetId = subnet.Id,
        RouteTableId = routeTable.Id
    });

    var securityGroup = new SecurityGroup("deepSeekSecurityGroup", new SecurityGroupArgs
    {
        VpcId = vpc.Id,
        Egress =
        {
            new SecurityGroupEgressArgs
            {
                FromPort = 0,
                ToPort = 0,
                Protocol = "-1",
                CidrBlocks = { "0.0.0.0/0" }
            }
        },
        Ingress =
        {
            new SecurityGroupIngressArgs
            {
                FromPort = 22,
                ToPort = 22,
                Protocol = "tcp",
                CidrBlocks = { "0.0.0.0/0" }
            },
            new SecurityGroupIngressArgs
            {
                FromPort = 3000,
                ToPort = 3000,
                Protocol = "tcp",
                CidrBlocks = { "0.0.0.0/0" }
            },
            new SecurityGroupIngressArgs
            {
                FromPort = 11434,
                ToPort = 11434,
                Protocol = "tcp",
                CidrBlocks = { "0.0.0.0/0" }
            }
        }
    });

    var publicKey = File.ReadAllText("deepseek.rsa");
    var keyPair = new KeyPair("deepSeekKey", new KeyPairArgs
    {
        PublicKey = publicKey
    });

    var amazonLinux = Pulumi.Aws.Ec2.GetAmi.Invoke(new()
    {
        MostRecent = true,
        Filters = new[]
        {
            new GetAmiFilterInputArgs
            {
                Name = "name",
                Values = new[] { "amzn2-ami-hvm-*-x86_64-gp2" }
            },
            new GetAmiFilterInputArgs
            {
                Name = "architecture",

  deepSeekVpc:
    type: aws:ec2:Vpc
    properties:
      cidrBlock: 10.0.0.0/16
      enableDnsHostnames: true
      enableDnsSupport: true

  deepSeekSubnet:
    type: aws:ec2:Subnet
    properties:
      vpcId: ${deepSeekVpc.id}
      cidrBlock: 10.0.48.0/20
      availabilityZone: eu-central-1a
      mapPublicIpOnLaunch: true

  deepSeekInternetGateway:
    type: aws:ec2:InternetGateway
    properties:
      vpcId: ${deepSeekVpc.id}

  deepSeekRouteTable:
    type: aws:ec2:RouteTable
    properties:
      vpcId: ${deepSeekVpc.id}
      routes:
        - cidrBlock: 0.0.0.0/0
          gatewayId: ${deepSeekInternetGateway.id}

  deepSeekRouteTableAssociation:
    type: aws:ec2:RouteTableAssociation
    properties:
      subnetId: ${deepSeekSubnet.id}
      routeTableId: ${deepSeekRouteTable.id}

  deepSeekSecurityGroup:
    type: aws:ec2:SecurityGroup
    properties:
      vpcId: ${deepSeekVpc.id}
      ingress:
        - fromPort: 22
          toPort: 22
          protocol: tcp
          cidrBlocks:
            - 0.0.0.0/0
        - fromPort: 3000
          toPort: 3000
          protocol: tcp
          cidrBlocks:
            - 0.0.0.0/0
        - fromPort: 11434
          toPort: 11434
          protocol: tcp
          cidrBlocks:
            - 0.0.0.0/0
      egress:
        - fromPort: 0
          toPort: 0
          protocol: -1
          cidrBlocks:
            - 0.0.0.0/0

Step 3: Launch the GPU EC2 instance

Now create the EC2 instance itself. The example uses Amazon Linux because the NVIDIA driver install path is well-trodden, plus an SSH key pair you generate locally.

The default instance type is g4dn.xlarge—the cheapest option that fits any 7B/8B model. Bump it up if you picked a larger model from the table above: g5.xlarge for 13B–14B, g5.2xlarge for 32B, g6e.2xlarge for 70B. AWS publishes full specs for the G4, G5, and G6 families.

Generate the key pair if you don’t already have one:

openssl genrsa -out deepseek.pem 2048
openssl rsa -in deepseek.pem -pubout > deepseek.pub
ssh-keygen -f mykey.pub -i -mPKCS8 > deepseek.pem
const keyPair = new aws.ec2.KeyPair("deepSeekKey", {
    publicKey: pulumi.output(fs.readFileSync("deepseek.rsa", "utf-8")),
});

const deepSeekAmi = aws.ec2
    .getAmi({
        filters: [
            {
                name: "name",
                values: ["amzn2-ami-hvm-2.0.*-x86_64-gp2"],
            },
            {
                name: "architecture",
                values: ["x86_64"],
            },
        ],
        owners: ["137112412989"], // Amazon
        mostRecent: true,
    })
    .then(ami => ami.id);

const deepSeekInstance = new aws.ec2.Instance("deepSeekInstance", {
    ami: deepSeekAmi,
    instanceType: "g4dn.xlarge",
    keyName: keyPair.keyName,
    rootBlockDevice: {
        volumeSize: 100,
        volumeType: "gp3",
    },
    subnetId: subnet.id,
    vpcSecurityGroupIds: [securityGroup.id],
    iamInstanceProfile: instanceProfile.name,
    userData: fs.readFileSync("cloud-init.yaml", "utf-8"),
    tags: {
        Name: "deepSeek-server",
    },
});

export const amiId = deepSeekAmi;
export const instanceId = deepSeekInstance.id;
export const instancePublicDns = deepSeekInstance.publicIp;

# Key pair for SSH access
public_key = open("deepseek.rsa", "r").read()
key_pair = aws.ec2.KeyPair("deepSeekKey", public_key=public_key)


# Get the latest Amazon Linux 2 AMI
ami = aws.ec2.get_ami(
    filters=[
        {"name": "name", "values": ["amzn2-ami-hvm-2.0.*-x86_64-gp2"]},
        {"name": "architecture", "values": ["x86_64"]},
    ],
    owners=["137112412989"],  # Amazon
    most_recent=True,
).id

# Create an EC2 instance
user_data = open("cloud-init.yaml", "r").read()
instance = aws.ec2.Instance(
    "deepSeekInstance",
    ami=ami,
    instance_type="g4dn.xlarge",
    key_name=key_pair.key_name,
    root_block_device=aws.ec2.InstanceRootBlockDeviceArgs(
        volume_size=100, volume_type="gp3"
    ),
    subnet_id=subnet.id,
    vpc_security_group_ids=[security_group.id],
    iam_instance_profile=instance_profile.name,
    user_data=user_data,
    tags={"Name": "deepSeek-server"},
)

pulumi.export("amiId", ami)
pulumi.export("instanceId", instance.id)
pulumi.export("instancePublicDns", instance.public_ip)
package main

import (
	"encoding/json"
	"os"

	"github.com/pulumi/pulumi-aws/sdk/v6/go/aws/ec2"
	"github.com/pulumi/pulumi-aws/sdk/v6/go/aws/iam"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
                // omitted for brevity

		// Key pair for SSH access
		publicKey, err := os.ReadFile("deepseek.rsa")
		if err != nil {
			return err
		}

		keyPair, err := ec2.NewKeyPair(ctx, "deepSeekKey", &ec2.KeyPairArgs{
			PublicKey: pulumi.String(string(publicKey)),
		})
		if err != nil {
			return err
		}

		// Get the latest Amazon Linux 2 AMI
		mostRecent := true
		ami, err := ec2.LookupAmi(ctx, &ec2.LookupAmiArgs{
			Filters: []ec2.GetAmiFilter{
				{
					Name:   "name",
					Values: []string{"amzn2-ami-hvm-2.0.*-x86_64-gp2"},
				},
				{
					Name:   "architecture",
					Values: []string{"x86_64"},
				},
			},
			Owners:     []string{"137112412989"},
			MostRecent: &mostRecent,
		})
		if err != nil {
			return err
		}

		// Create an EC2 instance
		userData, err := os.ReadFile("cloud-init.yaml")
		if err != nil {
			return err
		}

		instance, err := ec2.NewInstance(ctx, "deepSeekInstance", &ec2.InstanceArgs{
			Ami:          pulumi.String(ami.Id),
			InstanceType: pulumi.String("g4dn.xlarge"),
			KeyName:      keyPair.KeyName,
			RootBlockDevice: &ec2.InstanceRootBlockDeviceArgs{
				VolumeSize: pulumi.Int(100),
				VolumeType: pulumi.String("gp3"),
			},
			SubnetId:            subnet.ID(),
			VpcSecurityGroupIds: pulumi.StringArray{securityGroup.ID()},
			IamInstanceProfile:  instanceProfile.Name,
			UserData:            pulumi.String(string(userData)),
			Tags: pulumi.StringMap{
				"Name": pulumi.String("deepSeek-server"),
			},
		})
		if err != nil {
			return err
		}

		ctx.Export("amiId", pulumi.String(ami.Id))
		ctx.Export("instanceId", instance.ID())
		ctx.Export("instancePublicDns", instance.PublicIp)

		return nil
	})
}
using Pulumi;
using Pulumi.Aws.Ec2;
using Pulumi.Aws.Iam;
using System.Collections.Generic;
using System.IO;
using Pulumi.Aws.Ec2.Inputs;
using System.Text.Json;

return await Deployment.RunAsync(() =>
{
    var rolePolicy = JsonSerializer.Serialize(new Dictionary<string, object>
    {
            // omitted for brevity
                Name = "architecture",
                Values = new[] { "x86_64" }
            }
        },
        Owners = new[] { "137112412989" }
    });

    var userData = File.ReadAllText("cloud-init.yaml");
    var instance = new Instance("deepSeekInstance", new InstanceArgs
    {
        Ami = amazonLinux.Apply(r => r.Id),
        InstanceType = "g4dn.xlarge",
        KeyName = keyPair.KeyName,
        RootBlockDevice = new InstanceRootBlockDeviceArgs
        {
            VolumeSize = 100,
            VolumeType = "gp3"
        },
        SubnetId = subnet.Id,
        VpcSecurityGroupIds = { securityGroup.Id },
        IamInstanceProfile = instanceProfile.Name,
        UserData = userData,
        Tags = { { "Name", "deepSeek-server" } }
    });

    return new Dictionary<string, object?>
    {
        ["amiId"] = amazonLinux.Apply(r => r.Id),
        ["instanceId"] = instance.Id,
        ["instancePublicDns"] = instance.PublicIp
    };
});

  deepSeekKey:
    type: aws:ec2:KeyPair
    properties:
      publicKey: ${publicKey}

  deepSeekInstance:
    type: aws:ec2:Instance
    properties:
      ami: ${amiId}
      instanceType: "g4dn.xlarge"
      keyName: ${deepSeekKey.keyName}
      rootBlockDevice:
        volumeSize: 100
        volumeType: gp3
      subnetId: ${deepSeekSubnet.id}
      vpcSecurityGroupIds:
        - ${deepSeekSecurityGroup.id}
      iamInstanceProfile: ${deepSeekProfile.name}
      userData: ${userData}
      tags:
        Name: deepSeek-server

outputs:
  AmiId: ${amiId}
  InstanceId: ${deepSeekInstance.id}
  InstancePublicDns: ${deepSeekInstance.publicIp}

Step 4: Install Ollama via cloud-init

The EC2 instance is a blank box until cloud-init runs. The user-data script below installs the NVIDIA GRID drivers, Docker, and the NVIDIA Container Toolkit, then starts the Ollama and Open WebUI containers. To switch models, edit the ollama run line—the rest is identical regardless of which model you want.

#cloud-config
users:
- default

package_update: true

packages:
- apt-transport-https
- ca-certificates
- curl
- openjdk-17-jre-headless
- gcc

runcmd:
- yum install -y gcc kernel-devel-$(uname -r)
- aws s3 cp --recursive s3://ec2-linux-nvidia-drivers/latest/ .
- chmod +x NVIDIA-Linux-x86_64*.run
- /bin/sh ./NVIDIA-Linux-x86_64*.run --tmpdir . --silent
- touch /etc/modprobe.d/nvidia.conf
- echo "options nvidia NVreg_EnableGpuFirmware=0" | sudo tee --append /etc/modprobe.d/nvidia.conf
- yum install -y docker
- usermod -a -G docker ec2-user
- systemctl enable docker.service
- systemctl start docker.service
- curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
- yum install -y nvidia-container-toolkit
- nvidia-ctk runtime configure --runtime=docker
- systemctl restart docker
- docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama --restart always ollama/ollama
- sleep 120
- docker exec ollama ollama run deepseek-r1:7b
- docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Step 5: Deploy the infrastructure

Make sure your AWS credentials are configured:

aws configure

Pulumi supports several other authentication methods for the AWS provider. Once credentials are in place, deploy the infrastructure:

pulumi up

Pulumi previews the changes; type yes to confirm.

pulumi up
Please choose a stack, or create a new one: <create a new stack>
Please enter your desired stack name.
To create a stack in an organization, use the format <org-name>/<stack-name> (e.g. `acmecorp/dev`): dev
Please enter your desired stack name.
Created stack 'dev'
Previewing update (dev)

View in Browser (Ctrl+O): https://app.pulumi.com/dirien/deepseek-ollama-typescript/dev/previews/1dbb18ea-ba31-4d5b-9510-5dce19eb8ee8

     Type                              Name                            Plan
 +   pulumi:pulumi:Stack               deepseek-ollama-typescript-dev  create
 +   ├─ aws:ec2:KeyPair                deepSeekKey                     create
 +   ├─ aws:ec2:Vpc                    deepSeekVpc                     create
 +   ├─ aws:iam:Role                   deepSeekRole                    create
 +   ├─ aws:iam:InstanceProfile        deepSeekProfile                 create
 +   ├─ aws:ec2:SecurityGroup          deepSeekSecurityGroup           create
 +   ├─ aws:ec2:RouteTable             deepSeekRouteTable              create
 +   ├─ aws:ec2:InternetGateway        deepSeekInternetGateway         create
 +   ├─ aws:ec2:Subnet                 deepSeekSubnet                  create
 +   ├─ aws:iam:RolePolicyAttachment   deepSeekS3Policy                create
 +   ├─ aws:ec2:RouteTableAssociation  deepSeekRouteTableAssociation   create
 +   └─ aws:ec2:Instance               deepSeekInstance                create

Outputs:
    amiId            : "ami-085131ff43045c877"
    instanceId       : output<string>
    instancePublicDns: output<string>

Resources:
    + 12 to create

Do you want to perform this update? yes
Updating (dev)

View in Browser (Ctrl+O): https://app.pulumi.com/dirien/deepseek-ollama-typescript/dev/updates/1

     Type                              Name                            Status
 +   pulumi:pulumi:Stack               deepseek-ollama-typescript-dev  created (40s)
 +   ├─ aws:ec2:KeyPair                deepSeekKey                     created (0.47s)
 +   ├─ aws:iam:Role                   deepSeekRole                    created (1s)
 +   ├─ aws:ec2:Vpc                    deepSeekVpc                     created (12s)
 +   ├─ aws:iam:InstanceProfile        deepSeekProfile                 created (6s)
 +   ├─ aws:iam:RolePolicyAttachment   deepSeekS3Policy                created (0.90s)
 +   ├─ aws:ec2:InternetGateway        deepSeekInternetGateway         created (0.69s)
 +   ├─ aws:ec2:Subnet                 deepSeekSubnet                  created (11s)
 +   ├─ aws:ec2:SecurityGroup          deepSeekSecurityGroup           created (2s)
 +   ├─ aws:ec2:RouteTable             deepSeekRouteTable              created (1s)
 +   ├─ aws:ec2:RouteTableAssociation  deepSeekRouteTableAssociation   created (0.92s)
 +   └─ aws:ec2:Instance               deepSeekInstance                created (12s)

Outputs:
    amiId            : "ami-085131ff43045c877"
    instanceId       : "i-0ae7495781ace3e81"
    instancePublicDns: "18.159.211.136"

Resources:
    + 12 created

Duration: 42s

The infrastructure provisions in under a minute, but the cloud-init script needs another 5–10 minutes to install drivers, pull container images, and download the model weights. SSH in to watch the progress, or just wait and load the Web UI when it’s ready.

ssh -i deepseek.pem ec2-user@<instance-public-ip>

Check the container status:

sudo docker ps
CONTAINER ID   IMAGE                                COMMAND               CREATED         STATUS                   PORTS                                           NAMES
c8714335e205   ghcr.io/open-webui/open-webui:main   "bash start.sh"       6 minutes ago   Up 6 minutes (healthy)   0.0.0.0:3000->8080/tcp, :::3000->8080/tcp       open-webui
bf4bb3b7ede1   ollama/ollama                        "/bin/ollama serve"   8 minutes ago   Up 7 minutes             0.0.0.0:11434->11434/tcp, :::11434->11434/tcp   ollama
[ec2-user@ip-10-0-58-122 ~]$

Step 6: Access the Web UI or API

Once the containers are healthy, open http://<instance-public-ip>:3000 in your browser for Open WebUI:

Open WebUI is not secured by default. Restrict the security group to your IP, put it behind an authenticated proxy, or terminate TLS at an ALB before exposing it to the internet.

A screenshot of a chat interface with DeepSeek-R1:7B, showing a query asking for the square root of 144. The AI responds with a step-by-step explanation, defining the square root, setting up the equation, solving for x, and confirming that \sqrt{144} = 12. The interface has a dark theme, with the query displayed at the top and the AI’s structured response below.

For programmatic access, Ollama exposes an OpenAI-compatible API on port 11434. Most clients written for the OpenAI SDK only need a base-URL change:

from openai import OpenAI

client = OpenAI(
    base_url="http://<instance-public-ip>:11434/v1",
    api_key="ollama",  # required by the SDK, ignored by Ollama
)

response = client.chat.completions.create(
    model="llama3.1:8b",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
)
print(response.choices[0].message.content)

How do I switch models?

Ollama hosts every major open-weight family in its model library. Pulling a different model is two commands inside the EC2 instance—or a one-line edit to the cloud-init script if you want it provisioned automatically:

# DeepSeek-R1 distill (default in this guide)
docker exec ollama ollama run deepseek-r1:7b

# Llama 3.1 (Meta, 8B)
docker exec ollama ollama run llama3.1:8b

# Qwen 2.5 (Alibaba, 7B)
docker exec ollama ollama run qwen2.5:7b

# Mistral (7B)
docker exec ollama ollama run mistral:7b

# Larger reasoning model (needs g5.2xlarge or larger)
docker exec ollama ollama run deepseek-r1:32b

Tags follow a <size> or <size>-<quantization> pattern—8b, 8b-instruct-q4_K_M, 8b-instruct-q8_0. Q4 is the default and the right starting point; bump to Q8 only if you have spare VRAM and notice quality issues with Q4. Browse the full tag list for any model on its Ollama library page.

Cleaning up

When you’re done, tear everything down:

pulumi destroy

What are the next steps?

You now have a reproducible, IaC-managed deployment of any open-source LLM on AWS. The infrastructure is fixed; the model is a parameter. From here, the natural extensions are wiring this up to a real application, adding RAG over your own data, or moving the deployment behind an authenticated load balancer.

If you want to go further with AI on Pulumi, here are some related guides:

Try Pulumi for Free

Changelog

  • 2026-04-30 — Broadened scope from DeepSeek-only to any Ollama-supported model (Llama, Qwen, Mistral). Added TL;DR, instance-type recommendation table, cost-vs-hosted-API comparison, and HowTo structured data. Restructured headings as user questions. Verified Ollama and cloud-init commands against current versions.
  • 2025-03-10 — Minor edits and corrections.
  • 2025-01-27 — Original post: Run DeepSeek-R1 on AWS EC2 Using Ollama.

The infrastructure as code platform for any cloud.