Deploy a Scalable Fargate Service with Multiple Replicas

By Pulumi Team
Published
Updated

The Challenge

You need high availability and redundancy for a containerized application. Running multiple replicas ensures your service stays available even if individual containers fail, and distributes traffic so no single instance is overloaded.

What You'll Build

  • Fargate service running multiple replicas across availability zones
  • Application Load Balancer distributing traffic across all replicas
  • Automatic health checks and container replacement
  • ECR repository with Docker image built from local code
  • Public load balancer endpoint for accessing the application

Neo Try This Prompt in Pulumi Neo

Run this prompt in Neo to deploy your infrastructure, or edit it to customize.

Best For

Use this prompt when you need high availability for a containerized application. Ideal for production workloads that require redundancy, zero-downtime deployments, or horizontal scaling to handle traffic spikes.

Architecture Overview

This architecture deploys your application as a multi-replica Fargate service behind an Application Load Balancer. The core idea is straightforward: instead of running a single container that represents a single point of failure, you run several identical copies of your container. The load balancer distributes incoming requests across all healthy replicas, so if one container fails or becomes unresponsive, traffic automatically shifts to the remaining containers while ECS launches a replacement.

Fargate handles the underlying compute. You specify CPU and memory requirements per container, and AWS provisions the capacity needed for each replica. There are no EC2 instances to manage, patch, or right-size. ECS maintains your desired replica count by continuously monitoring task health and launching new tasks when existing ones are terminated or fail health checks.

The Application Load Balancer provides the single entry point for all traffic. It performs health checks against each Fargate task at regular intervals, removing unhealthy tasks from the target group and adding them back when they recover. This gives your deployment self-healing behavior without any custom monitoring or restart logic.

ECS Cluster and Service Configuration

The ECS cluster is the logical grouping for your Fargate tasks. Within the cluster, the ECS service defines the desired number of replicas, the subnets they run in, and how they connect to the load balancer. ECS distributes replicas across multiple availability zones automatically, so an AZ outage does not take down your entire service.

The service also manages rolling deployments. When you update the container image or task definition, ECS gradually replaces old tasks with new ones while maintaining the minimum healthy count. This enables zero-downtime deployments without any additional tooling.

Load Balancer and Health Checks

The ALB uses a target group to track which Fargate tasks are healthy and eligible for traffic. Health checks run at configurable intervals, and tasks that fail a threshold number of consecutive checks are deregistered. ECS then terminates the unhealthy task and launches a replacement to maintain the desired replica count.

This combination of ALB health checks and ECS desired count creates a self-healing loop: failures are detected, unhealthy tasks are removed from traffic, replacements are launched, and new tasks are registered with the load balancer once they pass health checks.

ECR and Image Management

The ECR repository stores your Docker image. The deployment builds the image from a local Dockerfile, pushes it to ECR, and references it in the task definition. All Fargate tasks pull from this same image, ensuring consistency across replicas.

Common Customizations

  • Enable service autoscaling: Add ECS service autoscaling policies based on CPU utilization or ALB request count so the replica count adjusts dynamically with traffic.
  • Add HTTPS termination: Request an ACM certificate and configure an HTTPS listener on the ALB to encrypt traffic between clients and the load balancer.
  • Configure sticky sessions: Enable session affinity on the target group if your application stores state locally, routing the same client to the same replica.
  • Use a custom domain: Add a Route53 alias record pointing your domain to the ALB for a user-friendly URL.