Networking and Load Balancing
This page covers network architecture, multi-AZ deployment, auto-scaling, load balancing, and DNS configuration for production self-hosted deployments.
For ingress/egress port requirements, see Network requirements.
Multi-AZ deployment
Deploy compute resources across at least two availability zones:
- Kubernetes: Spread pods across AZs using pod anti-affinity or topology spread constraints.
- ECS/Fargate: Configure services to launch tasks across multiple subnets in different AZs.
- VMs: Use auto-scaling groups spanning multiple AZs.
Network architecture
Deploy a VPC/VNet with subnets in 2+ AZs, separated by tier:
- Public subnets: Load balancers, NAT gateways
- Private subnets: Application containers
- Database subnets: Database instances (no public access)
Additional recommendations:
- Deploy one NAT gateway per AZ to avoid cross-AZ NAT bottlenecks.
- Use VPC endpoints or private endpoints for object storage access to reduce NAT traffic and improve performance.
Auto-scaling
Configure auto-scaling for the API service based on CPU utilization:
- Target: 50–60% average CPU utilization
- Minimum instances: 2 (for HA across AZs)
- Maximum instances: 4x desired count (to handle burst traffic)
- Scale-in protection: Use graceful draining to allow in-flight requests to complete before terminating instances
Graceful shutdown
When scaling in or deploying updates, ensure containers have time to finish in-flight requests:
- Set container stop timeout to at least 120 seconds.
- Kubernetes: Set
terminationGracePeriodSeconds: 130on the API pod spec (slightly above the 120-second stop timeout to allow clean shutdown before Kubernetes force-kills the pod). - ECS: Configure deregistration delay on the target group and set
stopTimeouton the container definition. - Use lifecycle hooks (ECS) or preStop hooks (Kubernetes) to drain connections before shutdown.
Load balancer configuration
- Deploy an Application Load Balancer (or equivalent).
- Enable deletion protection on the load balancer in production.
- Configure health checks for the API service (
/api/statusendpoint). - Set health check grace period to 120 seconds to allow containers to start.
DNS
Configure two DNS records pointing to your load balancer:
api.{domain}- for CLI and API accessapp.{domain}- for web console access
Thank you for your feedback!
If you have a question about how to use Pulumi, reach out in Community Slack.
Open an issue on GitHub to report a problem or suggest an improvement.