Build a Monitoring Dashboard

By Pulumi Team
Published
Updated

The Challenge

Running applications in the cloud without monitoring means problems go undetected until users report them. CloudWatch provides the foundation for observability on AWS, giving you dashboards for visibility, alarms for proactive alerting, and centralized logging for troubleshooting.

What You'll Build

  • CloudWatch dashboard visualizing key application metrics
  • Custom metrics for application-specific data points
  • Alarms with SNS notifications for threshold breaches
  • Centralized log group for application log aggregation

Neo Try This Prompt in Pulumi Neo

Run this prompt in Neo to deploy your infrastructure, or edit it to customize.

Best For

Use this prompt when you need to add monitoring and alerting to an existing AWS application, or when setting up observability infrastructure for a new deployment. Works well alongside any compute deployment (EC2, ECS, Lambda) that needs visibility into performance and health.

Architecture Overview

This deployment creates a complete monitoring stack on AWS using CloudWatch and SNS. CloudWatch serves as the central observability platform, collecting metrics from your AWS resources and custom application data points. Dashboards display these metrics visually, alarms evaluate them against thresholds and trigger notifications, and log groups aggregate application logs for search and analysis.

The monitoring architecture is decoupled from your application infrastructure, which means you can deploy it independently and connect it to existing resources. CloudWatch automatically collects standard metrics from most AWS services (EC2 CPU utilization, Lambda invocation counts, RDS connections), and you add custom metrics for application-specific data like request latency, queue depth, or business transaction counts.

SNS provides the notification layer. When a CloudWatch alarm transitions to the ALARM state, it publishes a message to an SNS topic. That topic can deliver notifications through email, SMS, Slack (via Lambda), or PagerDuty webhooks. This separation between alarm evaluation and notification delivery means you can route different alarms to different teams or escalation paths without changing the alarm configuration.

CloudWatch Dashboard

The dashboard provides a single-pane view of your application’s health. You define widgets that display metrics as time-series graphs, numbers, or text. Dashboards can combine metrics from multiple AWS services on one screen, so you can see EC2 CPU alongside RDS connections and Lambda error rates. Each widget specifies the metric namespace, dimensions, and time period to display.

Custom Metrics

Standard AWS metrics cover infrastructure health, but application-level metrics require custom metric publishing. Your application sends data points to CloudWatch using the PutMetricData API or through the CloudWatch agent. Common custom metrics include request duration percentiles, business transaction volumes, cache hit rates, and queue processing times. These metrics become available for dashboard widgets and alarm evaluation just like built-in metrics.

CloudWatch Alarms and SNS

Alarms evaluate a metric against a threshold over a specified number of evaluation periods. When the metric breaches the threshold (for example, CPU utilization exceeds 80% for three consecutive five-minute periods), the alarm transitions to the ALARM state and triggers its configured actions. SNS topics receive these notifications and fan them out to subscribers. You can configure OK actions as well, so your team knows when the issue has resolved.

Log Group

The CloudWatch Logs group aggregates log output from your application instances. EC2 instances send logs through the CloudWatch agent, ECS tasks use the awslogs driver, and Lambda functions log automatically. Centralized logging lets you search across all instances and services from one place, set up metric filters to extract numeric values from log patterns, and define retention policies to manage storage costs.

Common Customizations

  • Add composite alarms: Combine multiple alarms into a composite alarm that only fires when several conditions are true simultaneously, reducing noise from transient single-metric spikes.
  • Set up anomaly detection: Use CloudWatch anomaly detection instead of static thresholds for metrics with variable baselines, like request counts that fluctuate by time of day.
  • Add metric filters: Create metric filters on log groups to extract numeric values from log messages (like response times or error counts) and use them in dashboards and alarms.
  • Route to Slack or PagerDuty: Add a Lambda function subscribed to the SNS topic that formats alarm notifications and posts them to Slack channels or creates PagerDuty incidents.