Add production observability on Azure with Pulumi

Provision Azure Portal dashboards, log collection, alerting, and email notification wiring for a production-ready observability baseline on Azure.

Switch variant

Choose a different cloud.

Download blueprint

Get this Azure blueprint project as a zip. Switch Pulumi language here to keep the download aligned with the install commands and blueprint program on the page.

Download the TypeScript blueprint with the matching Pulumi program, dependency files, and README.

Download TypeScript blueprint

Download the Python blueprint with the matching Pulumi program, dependency files, and README.

Download Python blueprint

Download the Go blueprint with the matching Pulumi program, dependency files, and README.

Download Go blueprint

This guide builds a small production observability baseline with Pulumi. It creates cloud-native dashboards, log or metric sources, alert rules, and email notification wiring for Azure without introducing another monitoring platform.

Use it when you already have a service and need a repeatable first layer of visibility: where errors are counted, where latency is visible, who gets notified, and what minimal trace hook every service should expose.

Architecture

  • Log Analytics captures or derives service health signals.
  • Azure Portal dashboards displays the health view operators open first.
  • Azure Monitor metric alerts raises error and latency alerts where the platform supports the metric directly.
  • Azure Monitor Action Groups sends alert notifications to the notificationEmail Pulumi config value.
  • Application Insights is wired through a minimal trace-ready hook so your application can emit traces without changing the infrastructure shape later.

Azure observability shape

This variant uses Log Analytics, Application Insights, an Azure Portal dashboard resource, Azure Monitor metric alerting, and an Action Group email receiver.

Prerequisites

You need:

  • a Pulumi account and the Pulumi CLI
  • an Azure subscription where you can create resource groups, Log Analytics, Application Insights, dashboards, metric alerts, and action groups
  • an email address or distribution list owned by your team for alert notifications
  • Go 1.23 or newer

Download the blueprint

Use the Download blueprint button at the top of this page to grab the Azure zip for the language selected in the chooser. Each zip contains:

  • index.ts as the Pulumi entrypoint
  • components/observability.ts as the reusable component
  • package.json and tsconfig.json for the Pulumi project
  • __main__.py as the Pulumi entrypoint
  • components/observability.py as the reusable component
  • requirements.txt for the Pulumi project
  • main.go as the Pulumi entrypoint
  • observability/observability.go as the reusable component
  • go.mod for the Pulumi project

Unzip, change into the directory, and continue with the quickstart below.

Quickstart

Install dependencies, configure the alert recipient, and deploy.

# 1. Install Pulumi project dependencies
npm install

# 2. Initialize and configure the stack
pulumi stack init dev
pulumi config set azure-native:location eastus
pulumi config set notificationEmail <team-email-address>

# 3. Deploy
pulumi up
# 1. Install Pulumi project dependencies
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# 2. Initialize and configure the stack
pulumi stack init dev
pulumi config set azure-native:location eastus
pulumi config set notificationEmail <team-email-address>

# 3. Deploy
pulumi up
# 1. Install Pulumi project dependencies
go mod tidy

# 2. Initialize and configure the stack
pulumi stack init dev
pulumi config set azure-native:location eastus
pulumi config set notificationEmail <team-email-address>

# 3. Deploy
pulumi up

Azure sends action group notifications to the configured receiver after the action group is created.

What Pulumi creates

The stack provisions Azure Portal dashboards, Log Analytics resources, Azure Monitor metric alerts, and Azure Monitor Action Groups. The sample service hook is small: it exists only to show how traces and alert dimensions attach to a real workload boundary.

For production rollout, keep the component shape but replace the sample function or trace environment values with your real service, route names, and SLO thresholds.

Operate it

After pulumi up, use the stack outputs to find the resources operators need first.

Open dashboardId in Azure Portal and use traceHook as the Application Insights connection-string setting for your app runtime.

Start with the default thresholds, then adjust them after the service has enough traffic to show normal error and latency patterns. Keep notification recipients in Pulumi config so the starter never hardcodes personal addresses.

Blueprint Pulumi program

The entrypoint reads the notification email from Pulumi config, creates the observability component, and exports operator-facing resources.

import * as pulumi from "@pulumi/pulumi";
import { Observability } from "./components/observability";

const config = new pulumi.Config();
const notificationEmail = config.require("notificationEmail");

const observability = new Observability("observability", {
    notificationEmail,
    namePrefix: `${pulumi.getStack()}-production-observability`,
    tags: { environment: pulumi.getStack(), "solution-family": "production-observability", cloud: "azure", language: "typescript" },
});

export const dashboardId = observability.dashboardId;
export const notificationTarget = observability.notificationTarget;
export const traceHook = observability.traceHook;
import pulumi
from components.observability import Observability

config = pulumi.Config()
notification_email = config.require("notificationEmail")

observability = Observability("observability", notification_email=notification_email, name_prefix=f"{pulumi.get_stack()}-production-observability", tags={"environment": pulumi.get_stack(), "solution-family": "production-observability", "cloud": "azure", "language": "python"})

pulumi.export("dashboardId", observability.dashboard_id)
pulumi.export("notificationTarget", observability.notification_target)
pulumi.export("traceHook", observability.trace_hook)
package main

import (
    "fmt"
    "production-observability-azure/observability"
    "github.com/pulumi/pulumi/sdk/v3/go/pulumi"
    "github.com/pulumi/pulumi/sdk/v3/go/pulumi/config"
)

func Program(ctx *pulumi.Context) error {
        cfg := config.New(ctx, "")
        baseline, err := observability.NewObservability(ctx, "observability", &observability.ObservabilityArgs{NotificationEmail: cfg.Require("notificationEmail"), NamePrefix: fmt.Sprintf("%s-production-observability", ctx.Stack()), Tags: pulumi.StringMap{"environment": pulumi.String(ctx.Stack()), "solution-family": pulumi.String("production-observability"), "cloud": pulumi.String("azure"), "language": pulumi.String("go")}})
        if err != nil { return err }
        ctx.Export("dashboardId", baseline.DashboardID)
        ctx.Export("notificationTarget", baseline.NotificationTarget)
        ctx.Export("traceHook", baseline.TraceHook)
        return nil
}

func main() {
    pulumi.Run(Program)
}

Reusable observability component

The component provisions the dashboard, log or metric source, alert rules, notification target, and minimal trace-ready service hook for Azure.

components/observability.ts

Creates the Azure Portal dashboards, Log Analytics wiring, alert rules, notification target, and Application Insights hook.

import * as applicationinsights from "@pulumi/azure-native/applicationinsights";
import * as monitor from "@pulumi/azure-native/monitor";
import * as operationalinsights from "@pulumi/azure-native/operationalinsights";
import * as portal from "@pulumi/azure-native/portal";
import * as resources from "@pulumi/azure-native/resources";
import * as pulumi from "@pulumi/pulumi";
export interface ObservabilityArgs { notificationEmail: string; namePrefix: string; tags: Record<string, string>; }
export class Observability extends pulumi.ComponentResource {
  public readonly dashboardId: pulumi.Output<string>; public readonly notificationTarget: pulumi.Output<string>; public readonly traceHook: pulumi.Output<string>;
  constructor(name: string, args: ObservabilityArgs, opts?: pulumi.ComponentResourceOptions) {
    super("guides:productionObservability:Azure", name, {}, opts);
    const rg = new resources.ResourceGroup(`${name}-rg`, { resourceGroupName: args.namePrefix, tags: args.tags }, { parent: this });
    const workspace = new operationalinsights.Workspace(`${name}-workspace`, { resourceGroupName: rg.name, workspaceName: args.namePrefix, retentionInDays: 30, sku: { name: operationalinsights.WorkspaceSkuNameEnum.PerGB2018 }, tags: args.tags }, { parent: this });
    const app = new applicationinsights.Component(`${name}-appinsights`, { resourceGroupName: rg.name, resourceName: `${args.namePrefix}-app`, applicationType: "web", kind: "web", workspaceResourceId: workspace.id, tags: args.tags }, { parent: this });
    const group = new monitor.ActionGroup(`${name}-action-group`, { resourceGroupName: rg.name, actionGroupName: `${args.namePrefix}-alerts`, enabled: true, groupShortName: "obs", emailReceivers: [{ name: "team", emailAddress: args.notificationEmail, useCommonAlertSchema: true }], tags: args.tags }, { parent: this });
    new monitor.MetricAlert(`${name}-failed-requests`, { resourceGroupName: rg.name, ruleName: `${args.namePrefix}-failed-requests`, enabled: true, scopes: [app.id], severity: 2, evaluationFrequency: "PT1M", windowSize: "PT5M", criteria: { allOf: [{ criterionType: "StaticThresholdCriterion", metricName: "requests/failed", metricNamespace: "microsoft.insights/components", name: "failedRequests", operator: "GreaterThan", threshold: 1, timeAggregation: "Count" }], odataType: "Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria" }, actions: [{ actionGroupId: group.id }] }, { parent: this });
    const dashboard = new portal.Dashboard(`${name}-dashboard`, { resourceGroupName: rg.name, dashboardName: `${args.namePrefix}-dashboard`, location: rg.location, properties: { lenses: [{ order: 0, parts: [] }], metadata: {} }, tags: args.tags }, { parent: this });
    this.dashboardId = dashboard.id; this.notificationTarget = group.id; this.traceHook = app.connectionString.apply((v: string | undefined) => `APPLICATIONINSIGHTS_CONNECTION_STRING=${v!}`); this.registerOutputs({ dashboardId: this.dashboardId, notificationTarget: this.notificationTarget, traceHook: this.traceHook });
  }
}

components/observability.py

Creates the Azure Portal dashboards, Log Analytics wiring, alert rules, notification target, and Application Insights hook.

import pulumi
import pulumi_azure_native as azure_native
class Observability(pulumi.ComponentResource):
    def __init__(self, name, notification_email, name_prefix, tags, opts=None):
        super().__init__("guides:productionObservability:Azure", name, None, opts)
        child = pulumi.ResourceOptions(parent=self)
        rg = azure_native.resources.ResourceGroup(f"{name}-rg", resource_group_name=name_prefix, tags=tags, opts=child)
        workspace = azure_native.operationalinsights.Workspace(f"{name}-workspace", resource_group_name=rg.name, workspace_name=name_prefix, retention_in_days=30, sku={"name": azure_native.operationalinsights.WorkspaceSkuNameEnum.PER_GB2018}, tags=tags, opts=child)
        app = azure_native.applicationinsights.Component(f"{name}-appinsights", resource_group_name=rg.name, resource_name_=f"{name_prefix}-app", application_type="web", kind="web", workspace_resource_id=workspace.id, tags=tags, opts=child)
        group = azure_native.monitor.ActionGroup(f"{name}-action-group", resource_group_name=rg.name, action_group_name=f"{name_prefix}-alerts", enabled=True, group_short_name="obs", email_receivers=[{"name":"team","email_address":notification_email,"use_common_alert_schema":True}], tags=tags, opts=child)
        azure_native.monitor.MetricAlert(f"{name}-failed-requests", resource_group_name=rg.name, rule_name=f"{name_prefix}-failed-requests", enabled=True, scopes=[app.id], severity=2, evaluation_frequency="PT1M", window_size="PT5M", criteria={"odata_type":"Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria","all_of":[{"criterion_type":"StaticThresholdCriterion","metric_name":"requests/failed","metric_namespace":"microsoft.insights/components","name":"failedRequests","operator":"GreaterThan","threshold":1,"time_aggregation":"Count"}]}, actions=[{"action_group_id": group.id}], opts=child)
        dashboard = azure_native.portal.Dashboard(f"{name}-dashboard", resource_group_name=rg.name, dashboard_name=f"{name_prefix}-dashboard", location=rg.location, properties={"lenses":[{"order":0,"parts":[]}],"metadata":{}}, tags=tags, opts=child)
        self.dashboard_id = dashboard.id; self.notification_target = group.id; self.trace_hook = app.connection_string.apply(lambda v: f"APPLICATIONINSIGHTS_CONNECTION_STRING={v}")
        self.register_outputs({"dashboard_id": self.dashboard_id, "notification_target": self.notification_target, "trace_hook": self.trace_hook})

observability/observability.go

Creates the Azure Portal dashboards, Log Analytics wiring, alert rules, notification target, and Application Insights hook.

package observability

import (
    applicationinsights "github.com/pulumi/pulumi-azure-native-sdk/applicationinsights/v3"
    monitor "github.com/pulumi/pulumi-azure-native-sdk/monitor/v3"
    operationalinsights "github.com/pulumi/pulumi-azure-native-sdk/operationalinsights/v3"
    portal "github.com/pulumi/pulumi-azure-native-sdk/portal/v3"
    resources "github.com/pulumi/pulumi-azure-native-sdk/resources/v3"
    "github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

type ObservabilityArgs struct { NotificationEmail string; NamePrefix string; Tags pulumi.StringMap }
type Observability struct { pulumi.ResourceState; DashboardID pulumi.StringOutput; NotificationTarget pulumi.StringOutput; TraceHook pulumi.StringOutput }

func NewObservability(ctx *pulumi.Context, name string, args *ObservabilityArgs, opts ...pulumi.ResourceOption) (*Observability, error) {
    component := &Observability{}
    if err := ctx.RegisterComponentResource("guides:productionObservability:Azure", name, component, opts...); err != nil { return nil, err }
    child := pulumi.Parent(component)
    rg, err := resources.NewResourceGroup(ctx, name+"-rg", &resources.ResourceGroupArgs{ResourceGroupName: pulumi.String(args.NamePrefix), Tags: args.Tags}, child); if err != nil { return nil, err }
    workspace, err := operationalinsights.NewWorkspace(ctx, name+"-workspace", &operationalinsights.WorkspaceArgs{ResourceGroupName: rg.Name, WorkspaceName: pulumi.String(args.NamePrefix), RetentionInDays: pulumi.Int(30), Sku: &operationalinsights.WorkspaceSkuArgs{Name: pulumi.String(operationalinsights.WorkspaceSkuNameEnumPerGB2018)}, Tags: args.Tags}, child); if err != nil { return nil, err }
    app, err := applicationinsights.NewComponent(ctx, name+"-appinsights", &applicationinsights.ComponentArgs{ResourceGroupName: rg.Name, ResourceName: pulumi.String(args.NamePrefix+"-app"), ApplicationType: pulumi.String("web"), Kind: pulumi.String("web"), WorkspaceResourceId: workspace.ID(), Tags: args.Tags}, child); if err != nil { return nil, err }
    group, err := monitor.NewActionGroup(ctx, name+"-action-group", &monitor.ActionGroupArgs{ResourceGroupName: rg.Name, ActionGroupName: pulumi.String(args.NamePrefix+"-alerts"), Enabled: pulumi.Bool(true), GroupShortName: pulumi.String("obs"), EmailReceivers: monitor.EmailReceiverArray{&monitor.EmailReceiverArgs{Name: pulumi.String("team"), EmailAddress: pulumi.String(args.NotificationEmail), UseCommonAlertSchema: pulumi.Bool(true)}}, Tags: args.Tags}, child); if err != nil { return nil, err }
    _, err = monitor.NewMetricAlert(ctx, name+"-failed-requests", &monitor.MetricAlertArgs{ResourceGroupName: rg.Name, RuleName: pulumi.String(args.NamePrefix+"-failed-requests"), Location: pulumi.String("global"), Enabled: pulumi.Bool(true), Scopes: pulumi.StringArray{app.ID()}, Severity: pulumi.Int(2), EvaluationFrequency: pulumi.String("PT1M"), WindowSize: pulumi.String("PT5M"), Criteria: &monitor.MetricAlertSingleResourceMultipleMetricCriteriaArgs{AllOf: monitor.MetricCriteriaArray{&monitor.MetricCriteriaArgs{CriterionType: pulumi.String("StaticThresholdCriterion"), MetricName: pulumi.String("requests/failed"), MetricNamespace: pulumi.StringPtr("microsoft.insights/components"), Name: pulumi.String("failedRequests"), Operator: pulumi.String("GreaterThan"), Threshold: pulumi.Float64(1), TimeAggregation: pulumi.String("Count")}}, OdataType: pulumi.String("Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria")}, Actions: monitor.MetricAlertActionArray{&monitor.MetricAlertActionArgs{ActionGroupId: group.ID()}}}, child); if err != nil { return nil, err }
    dashboard, err := portal.NewDashboard(ctx, name+"-dashboard", &portal.DashboardArgs{ResourceGroupName: rg.Name, DashboardName: pulumi.String(args.NamePrefix+"-dashboard"), Location: rg.Location, Properties: &portal.DashboardPropertiesWithProvisioningStateArgs{Lenses: portal.DashboardLensArray{&portal.DashboardLensArgs{Order: pulumi.Int(0), Parts: portal.DashboardPartsArray{}}}}, Tags: args.Tags}, child); if err != nil { return nil, err }
    component.DashboardID = dashboard.ID().ToStringOutput(); component.NotificationTarget = group.ID().ToStringOutput(); component.TraceHook = app.ConnectionString.ApplyT(func(v string) string { return "APPLICATIONINSIGHTS_CONNECTION_STRING=" + v }).(pulumi.StringOutput)
    return component, nil
}

Frequently asked questions

Does this deploy an application?
It deploys only the smallest service hook needed to demonstrate log, metric, and trace wiring. Bring your real service names, metric filters, and alert thresholds before using the blueprint for production traffic.
Where does the notification email come from?
Each starter reads a Pulumi config value named notificationEmail. Set it to the address or distribution list your team controls before running pulumi up.
Does this include incident management or on-call rotation?
No. The blueprint stops at cloud-native email notification targets so you can connect your own incident workflow later without adding another platform to the starter.
What should I tune first?
Tune the error threshold, latency threshold, evaluation window, and dashboard widgets to match your service baseline after the first few deploys.
How do I clean it up?
Run pulumi destroy from the same stack, then remove any email subscription confirmation or notification channel that your cloud provider leaves pending outside Pulumi state.