The Challenge
Running applications in the cloud without monitoring means problems go undetected until users report them. CloudWatch provides the foundation for observability on AWS, giving you dashboards for visibility, alarms for proactive alerting, and centralized logging for troubleshooting.
What You'll Build
- → CloudWatch dashboard visualizing key application metrics
- → Custom metrics for application-specific data points
- → Alarms with SNS notifications for threshold breaches
- → Centralized log group for application log aggregation
Try This Prompt in Pulumi Neo
Run this prompt in Neo to deploy your infrastructure, or edit it to customize.
Best For
Architecture Overview
This deployment creates a complete monitoring stack on AWS using CloudWatch and SNS. CloudWatch serves as the central observability platform, collecting metrics from your AWS resources and custom application data points. Dashboards display these metrics visually, alarms evaluate them against thresholds and trigger notifications, and log groups aggregate application logs for search and analysis.
The monitoring architecture is decoupled from your application infrastructure, which means you can deploy it independently and connect it to existing resources. CloudWatch automatically collects standard metrics from most AWS services (EC2 CPU utilization, Lambda invocation counts, RDS connections), and you add custom metrics for application-specific data like request latency, queue depth, or business transaction counts.
SNS provides the notification layer. When a CloudWatch alarm transitions to the ALARM state, it publishes a message to an SNS topic. That topic can deliver notifications through email, SMS, Slack (via Lambda), or PagerDuty webhooks. This separation between alarm evaluation and notification delivery means you can route different alarms to different teams or escalation paths without changing the alarm configuration.
CloudWatch Dashboard
The dashboard provides a single-pane view of your application’s health. You define widgets that display metrics as time-series graphs, numbers, or text. Dashboards can combine metrics from multiple AWS services on one screen, so you can see EC2 CPU alongside RDS connections and Lambda error rates. Each widget specifies the metric namespace, dimensions, and time period to display.
Custom Metrics
Standard AWS metrics cover infrastructure health, but application-level metrics require custom metric publishing. Your application sends data points to CloudWatch using the PutMetricData API or through the CloudWatch agent. Common custom metrics include request duration percentiles, business transaction volumes, cache hit rates, and queue processing times. These metrics become available for dashboard widgets and alarm evaluation just like built-in metrics.
CloudWatch Alarms and SNS
Alarms evaluate a metric against a threshold over a specified number of evaluation periods. When the metric breaches the threshold (for example, CPU utilization exceeds 80% for three consecutive five-minute periods), the alarm transitions to the ALARM state and triggers its configured actions. SNS topics receive these notifications and fan them out to subscribers. You can configure OK actions as well, so your team knows when the issue has resolved.
Log Group
The CloudWatch Logs group aggregates log output from your application instances. EC2 instances send logs through the CloudWatch agent, ECS tasks use the awslogs driver, and Lambda functions log automatically. Centralized logging lets you search across all instances and services from one place, set up metric filters to extract numeric values from log patterns, and define retention policies to manage storage costs.
Common Customizations
- Add composite alarms: Combine multiple alarms into a composite alarm that only fires when several conditions are true simultaneously, reducing noise from transient single-metric spikes.
- Set up anomaly detection: Use CloudWatch anomaly detection instead of static thresholds for metrics with variable baselines, like request counts that fluctuate by time of day.
- Add metric filters: Create metric filters on log groups to extract numeric values from log messages (like response times or error counts) and use them in dashboards and alarms.
- Route to Slack or PagerDuty: Add a Lambda function subscribed to the SNS topic that formats alarm notifications and posts them to Slack channels or creates PagerDuty incidents.
Related Prompts
Deploy a Static Website
You need a fast, secure way to serve a static website globally. Whether it is a marketing site, documentation portal, or …
Build a Security and Compliance Stack
You need infrastructure that meets security and compliance requirements from day one. Rather than retrofitting security …
Deploy a Multi-Cloud Application
You need to run an application across multiple cloud providers so that a regional outage or provider-level incident does …
Create a Database-Backed API
You need a backend API that can handle CRUD operations without provisioning or managing servers. A serverless approach …