Deploy Multi-Region Disaster Recovery on AWS

By Pulumi Team
Published
Updated

The Challenge

You need your application to stay available even if an entire AWS region goes down. A single-region deployment means any regional outage, whether caused by network issues, service degradation, or infrastructure failure, takes your application offline until AWS resolves it. Multi-region disaster recovery eliminates this single point of failure.

What You'll Build

  • Active-passive deployment across two AWS regions
  • Cross-region database replication with low-latency sync
  • Automated health checks and DNS failover routing
  • Monitoring and alerting for failover events
  • Defined recovery time and recovery point objectives

Neo Try This Prompt in Pulumi Neo

Run this prompt in Neo to deploy your infrastructure, or edit it to customize.

Best For

Use this prompt when you need to protect a critical application against regional outages. Appropriate for production workloads with SLA requirements, applications where downtime has direct revenue impact, or any system that needs a documented disaster recovery plan.

Architecture Overview

This architecture deploys your application across two AWS regions in an active-passive configuration. The primary region handles all production traffic under normal conditions. The secondary region maintains a synchronized copy of your infrastructure and data, ready to accept traffic if the primary region fails.

The critical design decision is choosing active-passive over active-active. Active-passive is simpler to reason about because only one region serves traffic at a time, which avoids the complexity of data conflict resolution and session affinity across regions. The trade-off is that the standby region’s compute capacity sits idle during normal operations, adding cost without serving requests. For most applications, this cost is justified by the simpler operational model and lower risk of data inconsistency during failover.

DNS-based failover through Route53 health checks provides the traffic-switching mechanism. Health checks continuously poll your primary region’s endpoints. When they detect failures, Route53 automatically updates DNS records to point to the standby region. This approach works without application changes and handles most failure scenarios, though DNS TTL propagation means failover is not instantaneous.

Primary Region Stack

The primary region runs the full application stack: containerized services on ECS or EKS, a relational database with automated backups, caching layers, and any supporting services. This region handles all read and write traffic under normal operating conditions.

Standby Region Stack

The standby region mirrors the primary region’s infrastructure. Compute resources can be scaled down (or even set to zero) to reduce cost, since they only need to handle traffic during a failover event. The database maintains a continuously replicated copy from the primary region.

Failover Automation

Lambda functions or Step Functions orchestrate the failover process beyond what DNS routing handles alone. This includes promoting the standby database to a primary, scaling up compute resources in the standby region, and sending notifications to the operations team. Automating these steps reduces recovery time and eliminates the risk of human error during a stressful outage.

Common Customizations

  • Switch to active-active: Modify the prompt to request active-active routing for lower latency across geographies, accepting the added complexity of write conflict resolution and data synchronization.
  • Add RTO/RPO targets: Specify explicit recovery time (RTO) and recovery point (RPO) objectives, such as “5-minute RPO and 15-minute RTO,” to guide the replication and failover design.
  • Include runbook generation: Ask Neo to generate operational runbooks for manual failover procedures, testing schedules, and post-failover validation steps.
  • Extend to three regions: Add a third region for applications that need to survive simultaneous failures in two regions, or for geographic distribution across continents.