Add production observability on AWS with Pulumi

Provision Amazon CloudWatch dashboards, log collection, alerting, and email notification wiring for a production-ready observability baseline on AWS.

Switch variant

Choose a different cloud.

Download blueprint

Get this AWS blueprint project as a zip. Switch Pulumi language here to keep the download aligned with the install commands and blueprint program on the page.

Download the TypeScript blueprint with the matching Pulumi program, dependency files, and README.

Download TypeScript blueprint

Download the Python blueprint with the matching Pulumi program, dependency files, and README.

Download Python blueprint

Download the Go blueprint with the matching Pulumi program, dependency files, and README.

Download Go blueprint

This guide builds a small production observability baseline with Pulumi. It creates cloud-native dashboards, log or metric sources, alert rules, and email notification wiring for AWS without introducing another monitoring platform.

Use it when you already have a service and need a repeatable first layer of visibility: where errors are counted, where latency is visible, who gets notified, and what minimal trace hook every service should expose.

Architecture

  • Amazon CloudWatch Logs captures or derives service health signals.
  • Amazon CloudWatch dashboards displays the health view operators open first.
  • CloudWatch metric alarms raises error and latency alerts where the platform supports the metric directly.
  • Amazon SNS sends alert notifications to the notificationEmail Pulumi config value.
  • AWS X-Ray is wired through a minimal trace-ready hook so your application can emit traces without changing the infrastructure shape later.

AWS observability shape

This variant uses CloudWatch dashboards, a CloudWatch log group, Lambda metrics, SNS email subscriptions, and X-Ray tracing on the sample Lambda hook.

Prerequisites

You need:

  • a Pulumi account and the Pulumi CLI
  • an AWS account where you can create CloudWatch, Lambda, IAM, and SNS resources
  • an email address or distribution list owned by your team for alert notifications
  • Go 1.23 or newer

Download the blueprint

Use the Download blueprint button at the top of this page to grab the AWS zip for the language selected in the chooser. Each zip contains:

  • index.ts as the Pulumi entrypoint
  • components/observability.ts as the reusable component
  • package.json and tsconfig.json for the Pulumi project
  • __main__.py as the Pulumi entrypoint
  • components/observability.py as the reusable component
  • requirements.txt for the Pulumi project
  • main.go as the Pulumi entrypoint
  • observability/observability.go as the reusable component
  • go.mod for the Pulumi project

Unzip, change into the directory, and continue with the quickstart below.

Quickstart

Install dependencies, configure the alert recipient, and deploy.

# 1. Install Pulumi project dependencies
npm install

# 2. Initialize and configure the stack
pulumi stack init dev
pulumi config set aws:region us-west-2
pulumi config set notificationEmail <team-email-address>

# 3. Deploy
pulumi up
# 1. Install Pulumi project dependencies
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# 2. Initialize and configure the stack
pulumi stack init dev
pulumi config set aws:region us-west-2
pulumi config set notificationEmail <team-email-address>

# 3. Deploy
pulumi up
# 1. Install Pulumi project dependencies
go mod tidy

# 2. Initialize and configure the stack
pulumi stack init dev
pulumi config set aws:region us-west-2
pulumi config set notificationEmail <team-email-address>

# 3. Deploy
pulumi up

AWS sends a confirmation email for the SNS subscription. Alerts publish only after the recipient confirms it.

What Pulumi creates

The stack provisions Amazon CloudWatch dashboards, Amazon CloudWatch Logs resources, CloudWatch metric alarms, and Amazon SNS. The sample service hook is small: it exists only to show how traces and alert dimensions attach to a real workload boundary.

For production rollout, keep the component shape but replace the sample function or trace environment values with your real service, route names, and SLO thresholds.

Operate it

After pulumi up, use the stack outputs to find the resources operators need first.

Open dashboardId in CloudWatch, tail the Lambda log group, and confirm the SNS email subscription before relying on the alarms.

Start with the default thresholds, then adjust them after the service has enough traffic to show normal error and latency patterns. Keep notification recipients in Pulumi config so the starter never hardcodes personal addresses.

Blueprint Pulumi program

The entrypoint reads the notification email from Pulumi config, creates the observability component, and exports operator-facing resources.

import * as pulumi from "@pulumi/pulumi";
import { Observability } from "./components/observability";

const config = new pulumi.Config();
const notificationEmail = config.require("notificationEmail");

const observability = new Observability("observability", {
    notificationEmail,
    namePrefix: `${pulumi.getStack()}-production-observability`,
    tags: { environment: pulumi.getStack(), "solution-family": "production-observability", cloud: "aws", language: "typescript" },
});

export const dashboardId = observability.dashboardId;
export const notificationTarget = observability.notificationTarget;
export const traceHook = observability.traceHook;
import pulumi
from components.observability import Observability

config = pulumi.Config()
notification_email = config.require("notificationEmail")

observability = Observability("observability", notification_email=notification_email, name_prefix=f"{pulumi.get_stack()}-production-observability", tags={"environment": pulumi.get_stack(), "solution-family": "production-observability", "cloud": "aws", "language": "python"})

pulumi.export("dashboardId", observability.dashboard_id)
pulumi.export("notificationTarget", observability.notification_target)
pulumi.export("traceHook", observability.trace_hook)
package main

import (
    "fmt"
    "production-observability-aws/observability"
    "github.com/pulumi/pulumi/sdk/v3/go/pulumi"
    "github.com/pulumi/pulumi/sdk/v3/go/pulumi/config"
)

func Program(ctx *pulumi.Context) error {
        cfg := config.New(ctx, "")
        baseline, err := observability.NewObservability(ctx, "observability", &observability.ObservabilityArgs{NotificationEmail: cfg.Require("notificationEmail"), NamePrefix: fmt.Sprintf("%s-production-observability", ctx.Stack()), Tags: pulumi.StringMap{"environment": pulumi.String(ctx.Stack()), "solution-family": pulumi.String("production-observability"), "cloud": pulumi.String("aws"), "language": pulumi.String("go")}})
        if err != nil { return err }
        ctx.Export("dashboardId", baseline.DashboardID)
        ctx.Export("notificationTarget", baseline.NotificationTarget)
        ctx.Export("traceHook", baseline.TraceHook)
        return nil
}

func main() {
    pulumi.Run(Program)
}

Reusable observability component

The component provisions the dashboard, log or metric source, alert rules, notification target, and minimal trace-ready service hook for AWS.

components/observability.ts

Creates the Amazon CloudWatch dashboards, Amazon CloudWatch Logs wiring, alert rules, notification target, and AWS X-Ray hook.

import * as aws from "@pulumi/aws";
import * as pulumi from "@pulumi/pulumi";
export interface ObservabilityArgs { notificationEmail: string; namePrefix: string; tags: Record<string, string>; }
export class Observability extends pulumi.ComponentResource {
  public readonly dashboardId: pulumi.Output<string>; public readonly notificationTarget: pulumi.Output<string>; public readonly traceHook: pulumi.Output<string>;
  constructor(name: string, args: ObservabilityArgs, opts?: pulumi.ComponentResourceOptions) {
    super("guides:productionObservability:Aws", name, {}, opts);
    const logGroup = new aws.cloudwatch.LogGroup(`${name}-logs`, { name: `/aws/lambda/${args.namePrefix}-api`, retentionInDays: 30, tags: args.tags }, { parent: this });
    const topic = new aws.sns.Topic(`${name}-alerts`, { name: `${args.namePrefix}-alerts`, tags: args.tags }, { parent: this });
    new aws.sns.TopicSubscription(`${name}-email`, { topic: topic.arn, protocol: "email", endpoint: args.notificationEmail }, { parent: this });
    const role = new aws.iam.Role(`${name}-lambda-role`, { assumeRolePolicy: aws.iam.assumeRolePolicyForPrincipal({ Service: "lambda.amazonaws.com" }), tags: args.tags }, { parent: this });
    new aws.iam.RolePolicyAttachment(`${name}-basic`, { role: role.name, policyArn: aws.iam.ManagedPolicy.AWSLambdaBasicExecutionRole }, { parent: this });
    new aws.iam.RolePolicyAttachment(`${name}-xray`, { role: role.name, policyArn: aws.iam.ManagedPolicy.AWSXRayDaemonWriteAccess }, { parent: this });
    const fn = new aws.lambda.Function(`${name}-sample`, { name: `${args.namePrefix}-sample`, role: role.arn, runtime: aws.lambda.Runtime.NodeJS20dX, handler: "index.handler", code: new pulumi.asset.AssetArchive({ "index.js": new pulumi.asset.StringAsset("exports.handler = async () => ({ statusCode: 200, body: 'ok' });") }), tracingConfig: { mode: "Active" }, environment: { variables: { POWERTOOLS_SERVICE_NAME: args.namePrefix } }, tags: args.tags }, { parent: this, dependsOn: [logGroup] });
    const actions = [topic.arn];
    new aws.cloudwatch.MetricAlarm(`${name}-errors`, { name: `${args.namePrefix}-lambda-errors`, comparisonOperator: "GreaterThanOrEqualToThreshold", evaluationPeriods: 1, metricName: "Errors", namespace: "AWS/Lambda", period: 60, statistic: "Sum", threshold: 1, dimensions: { FunctionName: fn.name }, alarmActions: actions }, { parent: this });
    new aws.cloudwatch.MetricAlarm(`${name}-latency`, { name: `${args.namePrefix}-lambda-latency`, comparisonOperator: "GreaterThanThreshold", evaluationPeriods: 2, metricName: "Duration", namespace: "AWS/Lambda", period: 60, statistic: "Average", threshold: 1000, dimensions: { FunctionName: fn.name }, alarmActions: actions }, { parent: this });
    const dashboard = new aws.cloudwatch.Dashboard(`${name}-dashboard`, { dashboardName: `${args.namePrefix}-dashboard`, dashboardBody: pulumi.jsonStringify({ widgets: [{ type: "metric", width: 12, height: 6, properties: { metrics: [["AWS/Lambda", "Errors", "FunctionName", fn.name], [".", "Duration", ".", "."]], period: 60, stat: "Average", region: aws.config.region, title: "Sample Lambda health" } }] }) }, { parent: this });
    this.dashboardId = dashboard.dashboardName; this.notificationTarget = topic.arn; this.traceHook = fn.name.apply(v => `Lambda ${v} has X-Ray tracing active`); this.registerOutputs({ dashboardId: this.dashboardId, notificationTarget: this.notificationTarget, traceHook: this.traceHook });
  }
}

components/observability.py

Creates the Amazon CloudWatch dashboards, Amazon CloudWatch Logs wiring, alert rules, notification target, and AWS X-Ray hook.

import json
import pulumi
import pulumi_aws as aws
class Observability(pulumi.ComponentResource):
    def __init__(self, name, notification_email, name_prefix, tags, opts=None):
        super().__init__("guides:productionObservability:Aws", name, None, opts)
        child = pulumi.ResourceOptions(parent=self)
        log_group = aws.cloudwatch.LogGroup(f"{name}-logs", name=f"/aws/lambda/{name_prefix}-api", retention_in_days=30, tags=tags, opts=child)
        topic = aws.sns.Topic(f"{name}-alerts", name=f"{name_prefix}-alerts", tags=tags, opts=child)
        aws.sns.TopicSubscription(f"{name}-email", topic=topic.arn, protocol="email", endpoint=notification_email, opts=child)
        role = aws.iam.Role(f"{name}-lambda-role", assume_role_policy=json.dumps({"Version":"2012-10-17","Statement":[{"Action":"sts:AssumeRole","Principal":{"Service":"lambda.amazonaws.com"},"Effect":"Allow"}]}), tags=tags, opts=child)
        aws.iam.RolePolicyAttachment(f"{name}-basic", role=role.name, policy_arn=aws.iam.ManagedPolicy.AWS_LAMBDA_BASIC_EXECUTION_ROLE, opts=child)
        aws.iam.RolePolicyAttachment(f"{name}-xray", role=role.name, policy_arn="arn:aws:iam::aws:policy/AWSXRayDaemonWriteAccess", opts=child)
        fn = aws.lambda_.Function(f"{name}-sample", name=f"{name_prefix}-sample", role=role.arn, runtime=aws.lambda_.Runtime.NODE_JS20D_X, handler="index.handler", code=pulumi.AssetArchive({"index.js": pulumi.StringAsset("exports.handler = async () => ({ statusCode: 200, body: 'ok' });")}), tracing_config={"mode": "Active"}, environment={"variables": {"POWERTOOLS_SERVICE_NAME": name_prefix}}, tags=tags, opts=pulumi.ResourceOptions(parent=self, depends_on=[log_group]))
        actions = [topic.arn]
        aws.cloudwatch.MetricAlarm(f"{name}-errors", name=f"{name_prefix}-lambda-errors", comparison_operator="GreaterThanOrEqualToThreshold", evaluation_periods=1, metric_name="Errors", namespace="AWS/Lambda", period=60, statistic="Sum", threshold=1, dimensions={"FunctionName": fn.name}, alarm_actions=actions, opts=child)
        aws.cloudwatch.MetricAlarm(f"{name}-latency", name=f"{name_prefix}-lambda-latency", comparison_operator="GreaterThanThreshold", evaluation_periods=2, metric_name="Duration", namespace="AWS/Lambda", period=60, statistic="Average", threshold=1000, dimensions={"FunctionName": fn.name}, alarm_actions=actions, opts=child)
        dashboard = aws.cloudwatch.Dashboard(f"{name}-dashboard", dashboard_name=f"{name_prefix}-dashboard", dashboard_body=fn.name.apply(lambda n: json.dumps({"widgets": [{"type": "metric", "width": 12, "height": 6, "properties": {"metrics": [["AWS/Lambda", "Errors", "FunctionName", n], [".", "Duration", ".", "."]], "period": 60, "stat": "Average", "title": "Sample Lambda health"}}]})), opts=child)
        self.dashboard_id = dashboard.dashboard_name; self.notification_target = topic.arn; self.trace_hook = fn.name.apply(lambda n: f"Lambda {n} has X-Ray tracing active")
        self.register_outputs({"dashboard_id": self.dashboard_id, "notification_target": self.notification_target, "trace_hook": self.trace_hook})

observability/observability.go

Creates the Amazon CloudWatch dashboards, Amazon CloudWatch Logs wiring, alert rules, notification target, and AWS X-Ray hook.

package observability

import (
    "encoding/json"
    "fmt"

    "github.com/pulumi/pulumi-aws/sdk/v7/go/aws/cloudwatch"
    "github.com/pulumi/pulumi-aws/sdk/v7/go/aws/iam"
    "github.com/pulumi/pulumi-aws/sdk/v7/go/aws/lambda"
    "github.com/pulumi/pulumi-aws/sdk/v7/go/aws/sns"
    "github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

type ObservabilityArgs struct { NotificationEmail string; NamePrefix string; Tags pulumi.StringMap }
type Observability struct { pulumi.ResourceState; DashboardID pulumi.StringOutput; NotificationTarget pulumi.StringOutput; TraceHook pulumi.StringOutput }

func NewObservability(ctx *pulumi.Context, name string, args *ObservabilityArgs, opts ...pulumi.ResourceOption) (*Observability, error) {
    component := &Observability{}
    if err := ctx.RegisterComponentResource("guides:productionObservability:Aws", name, component, opts...); err != nil { return nil, err }
    child := pulumi.Parent(component)
    logGroup, err := cloudwatch.NewLogGroup(ctx, name+"-logs", &cloudwatch.LogGroupArgs{Name: pulumi.String(fmt.Sprintf("/aws/lambda/%s-api", args.NamePrefix)), RetentionInDays: pulumi.Int(30), Tags: args.Tags}, child); if err != nil { return nil, err }
    topic, err := sns.NewTopic(ctx, name+"-alerts", &sns.TopicArgs{Name: pulumi.String(args.NamePrefix+"-alerts"), Tags: args.Tags}, child); if err != nil { return nil, err }
    _, err = sns.NewTopicSubscription(ctx, name+"-email", &sns.TopicSubscriptionArgs{Topic: topic.Arn, Protocol: pulumi.String("email"), Endpoint: pulumi.String(args.NotificationEmail)}, child); if err != nil { return nil, err }
    assumeRole := `{"Version":"2012-10-17","Statement":[{"Action":"sts:AssumeRole","Principal":{"Service":"lambda.amazonaws.com"},"Effect":"Allow"}]}`
    role, err := iam.NewRole(ctx, name+"-lambda-role", &iam.RoleArgs{AssumeRolePolicy: pulumi.String(assumeRole), Tags: args.Tags}, child); if err != nil { return nil, err }
    _, err = iam.NewRolePolicyAttachment(ctx, name+"-basic", &iam.RolePolicyAttachmentArgs{Role: role.Name, PolicyArn: pulumi.String("arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole")}, child); if err != nil { return nil, err }
    _, err = iam.NewRolePolicyAttachment(ctx, name+"-xray", &iam.RolePolicyAttachmentArgs{Role: role.Name, PolicyArn: pulumi.String("arn:aws:iam::aws:policy/AWSXRayDaemonWriteAccess")}, child); if err != nil { return nil, err }
    fn, err := lambda.NewFunction(ctx, name+"-sample", &lambda.FunctionArgs{Name: pulumi.String(args.NamePrefix+"-sample"), Role: role.Arn, Runtime: pulumi.String(lambda.RuntimeNodeJS20dX), Handler: pulumi.String("index.handler"), Code: pulumi.NewAssetArchive(map[string]interface{}{"index.js": pulumi.NewStringAsset("exports.handler = async () => ({ statusCode: 200, body: 'ok' });")}), TracingConfig: &lambda.FunctionTracingConfigArgs{Mode: pulumi.String("Active")}, Tags: args.Tags}, child, pulumi.DependsOn([]pulumi.Resource{logGroup})); if err != nil { return nil, err }
    actions := pulumi.Array{topic.Arn}
    _, err = cloudwatch.NewMetricAlarm(ctx, name+"-errors", &cloudwatch.MetricAlarmArgs{Name: pulumi.String(args.NamePrefix+"-lambda-errors"), ComparisonOperator: pulumi.String("GreaterThanOrEqualToThreshold"), EvaluationPeriods: pulumi.Int(1), MetricName: pulumi.String("Errors"), Namespace: pulumi.String("AWS/Lambda"), Period: pulumi.Int(60), Statistic: pulumi.String("Sum"), Threshold: pulumi.Float64(1), Dimensions: pulumi.StringMap{"FunctionName": fn.Name}, AlarmActions: actions}, child); if err != nil { return nil, err }
    _, err = cloudwatch.NewMetricAlarm(ctx, name+"-latency", &cloudwatch.MetricAlarmArgs{Name: pulumi.String(args.NamePrefix+"-lambda-latency"), ComparisonOperator: pulumi.String("GreaterThanThreshold"), EvaluationPeriods: pulumi.Int(2), MetricName: pulumi.String("Duration"), Namespace: pulumi.String("AWS/Lambda"), Period: pulumi.Int(60), Statistic: pulumi.String("Average"), Threshold: pulumi.Float64(1000), Dimensions: pulumi.StringMap{"FunctionName": fn.Name}, AlarmActions: actions}, child); if err != nil { return nil, err }
    body := fn.Name.ApplyT(func(functionName string) (string, error) { data, err := json.Marshal(map[string]interface{}{"widgets": []interface{}{map[string]interface{}{"type":"metric","width":12,"height":6,"properties":map[string]interface{}{"metrics": []interface{}{[]interface{}{"AWS/Lambda","Errors","FunctionName",functionName}, []interface{}{".","Duration",".","."}},"period":60,"stat":"Average","title":"Sample Lambda health"}}}}); return string(data), err }).(pulumi.StringOutput)
    dashboard, err := cloudwatch.NewDashboard(ctx, name+"-dashboard", &cloudwatch.DashboardArgs{DashboardName: pulumi.String(args.NamePrefix+"-dashboard"), DashboardBody: body}, child); if err != nil { return nil, err }
    component.DashboardID = dashboard.DashboardName; component.NotificationTarget = topic.Arn; component.TraceHook = fn.Name.ApplyT(func(v string) string { return "Lambda " + v + " has X-Ray tracing active" }).(pulumi.StringOutput)
    return component, nil
}

Frequently asked questions

Does this deploy an application?
It deploys only the smallest service hook needed to demonstrate log, metric, and trace wiring. Bring your real service names, metric filters, and alert thresholds before using the blueprint for production traffic.
Where does the notification email come from?
Each starter reads a Pulumi config value named notificationEmail. Set it to the address or distribution list your team controls before running pulumi up.
Does this include incident management or on-call rotation?
No. The blueprint stops at cloud-native email notification targets so you can connect your own incident workflow later without adding another platform to the starter.
What should I tune first?
Tune the error threshold, latency threshold, evaluation window, and dashboard widgets to match your service baseline after the first few deploys.
How do I clean it up?
Run pulumi destroy from the same stack, then remove any email subscription confirmation or notification channel that your cloud provider leaves pending outside Pulumi state.