Deploy a Multi-Region Active-Active Application

By Pulumi Team
Published
Updated

The Challenge

You need an application that survives a full regional outage without downtime. Single-region architectures, even with multiple availability zones, are vulnerable to region-level failures. Active-active multi-region deployment ensures that your application serves users from both regions simultaneously, and traffic automatically shifts to the healthy region if one goes down.

What You'll Build

  • Active-active deployment across two regions
  • Global database with cross-region replication
  • Geographic DNS routing to nearest region
  • Automatic health-based failover
  • Global session storage

Neo Try This Prompt in Pulumi Neo

Run this prompt in Neo to deploy your infrastructure, or edit it to customize.

Best For

Use this prompt when your application has strict uptime requirements that exceed what a single region can provide, when you need to serve users on multiple continents with low latency, or when regulatory requirements mandate geographic data residency with cross-region failover.

Architecture Overview

This architecture deploys your application in two AWS regions simultaneously, with both regions actively serving traffic. DNS-based routing directs users to the nearest region based on their geographic location, which reduces latency and distributes load. A global database replicates data across both regions, and global session storage ensures user sessions persist through failover events.

The “active-active” distinction is important. Unlike active-passive configurations where a standby region sits idle until a failure occurs, both regions in this architecture handle real traffic at all times. This means failover is seamless because the receiving region is already warm, running the same application code, and serving other users. There is no cold-start penalty when traffic shifts.

The primary challenge is data consistency. With writes happening in both regions, the database must reconcile potentially conflicting changes. A global database with write forwarding addresses this by designating one region as the primary writer and forwarding writes from the secondary region. This provides strong consistency for writes while allowing low-latency reads from either region. Global session storage uses a similar replication model so user sessions are available in both regions.

Compute Layer

Each region runs identical compute infrastructure on Fargate, with the same container images, task definitions, and scaling configurations. A load balancer in each region distributes traffic across tasks and performs health checks. The compute layer is stateless; all state lives in the global database and session store, which is what makes failover possible.

Global Database

The global database replicates data between regions with low latency. The primary region accepts writes directly, and the secondary region forwards writes to the primary. Both regions serve reads from their local replica, so read latency is low regardless of which region the user connects to. If the primary region fails, the secondary is promoted to primary, and the application continues without data loss.

Traffic Routing and Failover

DNS-based routing uses health checks to monitor application endpoints in both regions. Under normal operation, users are routed to the closest region. When health checks detect a failure in one region, DNS automatically shifts traffic to the healthy region. The failover time depends on DNS TTL settings, which can be configured for faster detection at the cost of higher DNS query volume.

Common Customizations

  • Add a third region: Extend the deployment to a third region for additional geographic coverage and redundancy.
  • Add CDN caching: Request a global CDN distribution with origins in both regions for static content and API caching.
  • Add data residency controls: Ask for routing rules that keep specific users’ data in a designated region to comply with data sovereignty regulations like GDPR.
  • Add chaos testing: Request automated failover testing that simulates regional failures on a schedule to validate that your failover mechanisms work correctly.