Run an opinionated AKS cluster with Pulumi

Switch variant

Choose a different cloud.

Provision an opinionated AKS cluster on the Pulumi landing-zone network, preinstall External Secrets Operator plus Application Gateway for Containers and Node Auto Provisioning (NAP) through a reusable component, and export a kubeconfig downstream workloads can consume.

Before you deploy: deploy the Azure landing zone first.

This blueprint consumes shared network, identity, and secret-store outputs from the Azure landing-zone stack in the same cloud account. If you haven't deployed one yet, follow Build a Azure landing zone and come back with the stack name.

Download blueprint

Get this Azure blueprint project as a zip. Switch Pulumi language here to keep the download aligned with the install commands and blueprint program on the page.

Download the TypeScript blueprint with the matching Pulumi program, dependency files, and README.

Download TypeScript blueprint

Download the Python blueprint with the matching Pulumi program, dependency files, and README.

Download Python blueprint

Download the Go blueprint with the matching Pulumi program, dependency files, and README.

Download Go blueprint

What this guide covers

A production-shaped managed Kubernetes blueprint that consumes the Pulumi landing-zone stack and ships with the controllers most teams install by hand on day one. One Pulumi stack provisions the cluster, the add-ons workloads expect, the workload-identity wiring downstream stacks need, and outputs they can consume by name.

The blueprint covers:

  • one Pulumi stack that provisions a managed AKS cluster inside the landing-zone virtual network
  • a small system node pool sized for the in-cluster controllers, with Node Auto Provisioning (NAP) handling every workload node on demand
  • pinned installs of External Secrets Operator and Application Gateway for Containers through Helm, plus the cloud-side data-plane resources Pulumi provisions as part of the same stack
  • Azure AD Workload Identity wired end-to-end for every service account the add-ons use, so pods call cloud APIs with short-lived tokens only
  • restricted Pod Security Admission labels on the add-on namespaces from the first deploy
  • a reusable Cluster component so other Pulumi projects can provision the same cluster shape in their own stacks
  • a Pulumi ESC environment and StackReference snippets every workload stack can import by name

Everything the blueprint creates is additive, so you can bring your own add-ons, node pools, or workloads on top without touching the module.

What gets deployed

In one Pulumi stack on Azure this blueprint provisions:

  • Cluster control plane: a managed AKS cluster at Kubernetes version 1.33 with Azure AD Workload Identity turned on so pods call cloud APIs with short-lived tokens instead of long-lived credentials.
  • System node pool: 2 Standard_D4s_v5 instance(s) sized for the in-cluster controllers (External Secrets Operator, the ingress controller, and the cloud-native autoscaler itself). Every additional workload node is launched by Node Auto Provisioning (NAP) on demand.
  • Add-ons:
    • External Secrets Operator chart v2.3.0 installed in the external-secrets namespace with workload-identity-backed access to Azure Key Vault.
    • Application Gateway for Containers wired for Layer-7 ingress: the in-cluster controller is installed through Helm and the cloud-side data-plane service is provisioned by Pulumi so workload stacks can drop Ingress / Gateway / HTTPRoute resources on the first deploy.
    • Node Auto Provisioning (NAP) configured to launch workload nodes on demand with scoped IAM/identity and the landing-zone network.
  • Workload Identity: one identity per controller service account, scoped to a single namespace + service-account pair so no other pod can assume it.
  • Pod Security Admission: the restricted profile is enforced on the external-secrets and ingress-controller namespaces so privileged containers cannot land there by default.

Every resource is annotated with pulumi-stack, landing-zone, solution-family, cloud, and language labels/tags so workload stacks can filter them later. Cluster control-plane logs ship to the cloud-native audit destination the landing-zone stack already wires up (CloudWatch on AWS, Log Analytics on Azure, Cloud Logging on GCP).

On Azure

The blueprint uses AKS for the control plane, the landing-zone virtual network (Azure CNI Overlay + Cilium dataplane) for pod networking, and AKS system + user node pools with Node Auto Provisioning so you run a tiny system pool and let Node Auto Provisioning scale every workload node on demand. Azure AD Workload Identity is enabled on the control plane so pods call Azure APIs with short-lived tokens, never with long-lived secrets.

The first deployment creates:

  • one AKS cluster at Kubernetes 1.33 on the Base / Standard SKU with OIDC issuer + Workload Identity turned on, AAD-managed RBAC for cluster admin, and Node Auto Provisioning in Auto mode
  • one system node pool of 2 Standard_D4s_v5 VMs on Azure Linux joined to the landing-zone subnet
  • user-assigned Managed Identities for External Secrets Operator and the Gateway API controller (the Azure alb-controller chart that ships with Application Gateway for Containers - this is Microsoft’s in-cluster controller for AGC, not AWS’s unrelated ALB controller), each with a FederatedIdentityCredential scoped to one service account in one namespace so no other pod can assume that identity
  • a Pod Security Admission restricted label on the add-on namespaces (external-secrets, azure-alb-system) so privileged workloads cannot land there
  • the External Secrets Operator Helm release wired to Azure Key Vault through the landing-zone Key Vault, with azure.workload.identity/client-id on the service account
  • an Application Gateway for Containers TrafficController + default Frontend on the cloud side, plus Microsoft’s alb-controller Helm release in azure-alb-system so workload stacks can create Gateway / HTTPRoute resources without extra setup

Quickstart

Deploy the landing-zone stack first, then point this stack at it. The landing-zone stack owns the shared primitives this cluster plugs into: the landing-zone virtual network the nodes run on, the customer-managed encryption key the control plane uses, the deployer identity that needs kubectl access, and the Azure Key Vault instance External Secrets Operator reads from. Keeping those in a separate stack lets one team own the account foundation while many teams stand up their own AKS clusters against it, and destroying a cluster never tears down the network other stacks depend on. This stack reads those outputs over a StackReference and fails fast if any are missing, so a missing landing-zone stack is the first thing pulumi up complains about.

  1. Make sure the Pulumi landing-zone stack for this cloud is already up. If not, follow the Azure landing-zone guide before coming back.
  2. Download the example zip at the top of the page and unzip it.
  3. Open a terminal in the extracted project root.
  4. Install the Pulumi dependencies for the language you want to use:
npm install
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
go mod tidy
  1. For a first local test, keep using whichever Azure credentials already work in your shell. If you want a shared or repeatable setup, use the Pulumi ESC section below before continuing.
  2. Create the stack, tell it which landing-zone stack to consume, and deploy:
pulumi login
pulumi stack init dev
pulumi config set azure-native:location eastus
pulumi config set landingZoneStack <your-org>/landing-zone/dev
pulumi up
  1. When the update finishes, export the kubeconfig and verify the cluster:
pulumi stack output kubeconfig --show-secrets > kubeconfig.yaml
KUBECONFIG=./kubeconfig.yaml kubectl get nodes
KUBECONFIG=./kubeconfig.yaml kubectl get pods -A

You should see the system nodes Ready and all three controllers (external-secrets, aws-load-balancer-controller, karpenter) running.

Prerequisites

  • a Pulumi account and the Pulumi CLI installed
  • the Pulumi landing-zone stack already deployed in this Azure account
  • kubectl on your path
  • Helm 3.14 or newer (the blueprint uses the Pulumi Helm Release resource; Helm on your machine is only required if you later want to helm into the cluster by hand)
  • an Azure subscription where the Pulumi landing-zone stack is already deployed and you have Owner / User Access Administrator rights to create AKS, Managed Identity, role assignments, and Key Vault
  • Node.js 20 or newer and npm

Consume the landing-zone stack

This stack reads the outputs it needs from the landing-zone stack through a StackReference. For Azure:

  • resourceGroupName - owner of the AKS cluster and the Managed Identities this stack creates
  • clusterSubnetId - the landing-zone subnet for the system node pool
  • deployerPrincipalId - granted AKS RBAC Cluster Admin so your deployer identity can kubectl
  • secretsStore - the Key Vault name External Secrets Operator will read from

Set which landing-zone stack to read:

pulumi config set landingZoneStack <your-org>/landing-zone/dev

The blueprint resolves that config value into a pulumi.StackReference at runtime and fails fast if any output it needs is missing. If you want to run this blueprint against a network you already manage, replace the StackReference block in the entrypoint with the ids you already have - the Cluster component does not care where those values come from.

Set up credentials with Pulumi ESC

Before you run pulumi up, configure Pulumi ESC so your stack receives short-lived Azure credentials through dynamic login credentials.

If you already have working Azure credentials in your shell and only want a quick local test, you can skip this section. The landing-zone family has a longer walkthrough that applies here verbatim; reuse the same ESC environment between landing-zone and AKS stacks so cluster upgrades run with the same deployer identity that created the network.

Step 1: Create or update an ESC environment

imports:
  - <your-org>/base
values:
  azure:
    login:
      fn::open::azure-login:
        clientId: 00000000-0000-0000-0000-000000000000
        tenantId: 00000000-0000-0000-0000-000000000000
        subscriptionId: /subscriptions/00000000-0000-0000-0000-000000000000
        oidc: true
  pulumiConfig:
    azure-native:location: eastus

Step 2: Attach the environment to your stack

In Pulumi.dev.yaml, add:

environment:
  - <your-org>/<your-environment>

Pulumi picks up the environment automatically on pulumi preview, pulumi up, and pulumi destroy. You do not need to run esc open <your-org>/<your-environment> first.

What you get in the download

The downloadable example zip includes:

  • index.ts as the Pulumi entrypoint
  • components/cluster.ts as the reusable Cluster module
  • package.json and tsconfig.json for the root Pulumi project
  • README.md with the same commands you will see on this page
  • index.ts as the Pulumi entrypoint
  • components/cluster.ts as the reusable Cluster module
  • package.json and tsconfig.json for the root Pulumi project
  • __main__.py as the Pulumi entrypoint
  • components/cluster.py as the reusable Cluster module
  • requirements.txt for the root Pulumi project
  • main.go as the Pulumi entrypoint
  • cluster/cluster.go as the reusable Cluster module
  • go.mod for the root Pulumi project

The entrypoint stays small: it loads the landing-zone outputs, reads a handful of config values, and instantiates the reusable Cluster component. The component file is where the cluster shape, add-on installs, and IRSA bindings live.

Deploy with Pulumi

Run these from the extracted project root.

Step 1: Install the root Pulumi dependencies for the language you want to use

npm install
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
go mod tidy

Step 2: Create a Pulumi stack and point it at your landing-zone stack

pulumi login
pulumi stack init dev
pulumi config set azure-native:location eastus
pulumi config set landingZoneStack <your-org>/landing-zone/dev

If you already created the stack, pulumi stack select dev instead.

Step 3: Deploy

pulumi up

Approve the preview when Pulumi asks.

The first run creates the AKS control plane (with OIDC issuer + Workload Identity on), the system node pool, Managed Identities and FederatedIdentityCredential resources for each controller, the AGC TrafficControllerInterface + default Frontend on the cloud side, and the two Helm releases (External Secrets Operator plus Microsoft’s alb-controller chart). Expect 8-12 minutes on a cold subscription; most of that time is AKS provisioning.

Pulumi imports the ESC environment automatically through the environment: reference in your stack config. You do not need esc open <your-org>/<your-environment> before pulumi up.

Step 4: Verify the cluster

pulumi stack output kubeconfig --show-secrets > kubeconfig.yaml
KUBECONFIG=./kubeconfig.yaml kubectl get nodes
KUBECONFIG=./kubeconfig.yaml kubectl -n external-secrets rollout status deploy/external-secrets
KUBECONFIG=./kubeconfig.yaml kubectl -n azure-alb-system rollout status deploy/alb-controller

Controllers should report successfully rolled out (or Established for the GKE gateway class) once healthy.

Stack outputs

Every variant exports the same top-level shape so downstream Pulumi projects can consume the cluster the same way regardless of cloud. Run pulumi stack output --show-secrets after pulumi up to see values.

Common across AWS, Azure, and GCP:

  • kubeconfig (Pulumi secret) - authenticated kubeconfig you can feed into new pulumi.Provider("kubernetes", { kubeconfig })
  • clusterName - the provider-assigned cluster name
  • clusterEndpoint - the control-plane API endpoint
  • clusterCertificateAuthority - base64 CA cert, useful when the downstream stack builds its own kubeconfig
  • escEnvironment - the Pulumi ESC environment name workload stacks import by reference

AKS-specific:

  • oidcIssuerUrl - the cluster’s OIDC issuer URL, used when downstream stacks create their own FederatedIdentityCredential
  • externalSecretsIdentityClientId - the client id of the Managed Identity the ESO service account federates to
  • ingressControllerIdentityClientId - the client id the AGC Gateway API controller service account federates to
  • trafficControllerId - the Application Gateway for Containers resource id; attach per-environment Frontend, AssociationsInterface, and Gateway API routes against it

Add-ons

What is installed

Every variant installs the same three things, with cloud-appropriate wiring:

  • External Secrets Operator (chart v2.3.0, namespace external-secrets) syncs secrets from Azure Key Vault into Kubernetes Secret objects. Its service account uses Azure AD Workload Identity so the operator authenticates with short-lived tokens.
  • Application Gateway for Containers is the Layer-7 entry point this cluster will answer Ingress / Gateway / HTTPRoute resources on.
  • Node Auto Provisioning (NAP) handles workload-node launches. The system pool stays small; every additional node is launched by the autoscaler when a pending pod cannot fit.

The Azure variant turns on Node Auto Provisioning in Auto mode on the ManagedCluster itself (nodeProvisioningProfile.mode: Auto, Azure CNI Overlay + Cilium dataplane). There is no separate Karpenter Helm release to install - the cluster IS the control plane. For Gateway API ingress, Pulumi provisions an ApplicationLoadBalancer (TrafficControllerInterface) + default Frontend on the cloud side, and installs Microsoft’s alb-controller Helm chart on the cluster side (the in-cluster controller for Application Gateway for Containers) with a federated Managed Identity.

Pod Security Admission

The add-on namespaces (external-secrets, plus the ingress-controller namespace for this cloud) are labelled with pod-security.kubernetes.io/enforce: restricted from the first deploy, matching the Kubernetes project’s recommended baseline for platform add-ons. Drop application workloads in new namespaces with your own PSA labels so the cluster never starts with a “default is permissive” story.

Add-on controls

Each add-on has a config flag. Disable any of them at pulumi up time:

pulumi config set enableExternalSecrets false
pulumi config set enableIngressController false

Node Auto Provisioning is controlled by the cluster’s nodeProvisioningProfile.mode setting, not by a Pulumi config flag.

Keeping an add-on disabled skips the Helm release and the identity resources that support it, so nothing orphans in your account. You can re-enable later and pulumi up again.

Add another add-on

The Cluster component exposes the in-cluster Kubernetes provider as an output. From the same program you can drop in additional kubernetes.helm.v3.Release resources against that provider and they will install on the same cluster alongside the blueprint add-ons. Keep workload-identity bindings inside the component if they need access to cloud APIs so the audit story stays consistent.

What the blueprint does NOT install

Intentionally out of scope for the first deploy: a full observability stack (Prometheus / Grafana / Loki) and a GitOps controller (Flux / Argo CD). Both are worth adding early - follow the pattern above or add them as dedicated families later.

Consume the cluster from workload stacks

Once the stack is up, every Pulumi workload project in the same Azure account can deploy into the cluster. Two patterns, pick whichever fits your team.

Pattern 1: Pulumi ESC environment

The stack attaches a Pulumi ESC environment (escEnvironment output). Downstream projects import it with one line in their stack config:

environment:
  - your-org/azure-kubernetes-dev

After that, a kubernetes.Provider instantiated from pulumi.Config().requireSecret("kubeconfig") talks directly to this cluster.

Pattern 2: StackReference

If you prefer explicit wiring, use a StackReference:

import * as pulumi from "@pulumi/pulumi";
import * as k8s from "@pulumi/kubernetes";

const cluster = new pulumi.StackReference("your-org/kubernetes/dev");
const kubeconfig = cluster.requireOutput("kubeconfig") as pulumi.Output<string>;

const provider = new k8s.Provider("workload", { kubeconfig });
import pulumi
import pulumi_kubernetes as k8s

cluster = pulumi.StackReference("your-org/kubernetes/dev")
kubeconfig = cluster.require_output("kubeconfig")

provider = k8s.Provider("workload", kubeconfig=kubeconfig)
cluster, err := pulumi.NewStackReference(ctx, "your-org/kubernetes/dev", nil)
if err != nil {
    return err
}
kubeconfig := cluster.GetStringOutput(pulumi.String("kubeconfig"))

provider, err := kubernetes.NewProvider(ctx, "workload", &kubernetes.ProviderArgs{
    Kubeconfig: kubeconfig,
})
if err != nil {
    return err
}

Running workloads on Node Auto Provisioning (NAP)

Every variant launches nodes on demand; you do not need to manage node pools manually for application workloads.

Node Auto Provisioning reads pending pods directly once you omit taints / node-selectors that would pin them elsewhere. If you need a specific VM family, express it via nodeSelector keys (e.g. karpenter.sh/capacity-type: on-demand) or apply an AKSNodeClass / NodePool CRD against the in-cluster Karpenter-for-Azure controller the platform installs.

Using External Secrets

Create SecretStore (or ClusterSecretStore) and ExternalSecret resources that point at Azure Key Vault.

Provider: azurekv. Authenticate with workloadIdentity - the external-secrets service account is already labelled azure.workload.identity/use: "true" and annotated with the Managed Identity client id. Point the SecretStore at the Key Vault the landing-zone stack provisioned.

Set up CI/CD with Pulumi Deployments

A managed AKS cluster is something you want updated from a tracked source, not from a laptop. Pulumi Deployments runs pulumi up from your GitHub repository whenever you merge to a branch.

What you will configure in Pulumi Deployments for this project:

  • the Git repository and branch holding the unzipped blueprint
  • the stack name (for example your-org/azure-kubernetes/dev)
  • the root dependency command for the language you picked (npm install)
  • the Pulumi ESC environment reference, so Deployments receives the same short-lived credentials as your local run
  • the landingZoneStack config value so Deployments knows which landing-zone stack to consume

Once Deployments is wired up, land add-on upgrades, Kubernetes version bumps, and node-pool changes through PRs. Workload stacks that consume this cluster pick up the new outputs automatically on their next pulumi up.

Blueprint Pulumi program

The blueprint keeps the entrypoint tight: it reads landing-zone outputs, configures the cluster, and instantiates the reusable Cluster component.

import * as pulumi from "@pulumi/pulumi";
import * as azure from "@pulumi/azure-native";
import { Cluster } from "./components/cluster";

const config = new pulumi.Config();
const landingZoneStackName = config.require("landingZoneStack");
const clusterVersion = config.get("clusterVersion") ?? "1.33";
const systemNodeVmSize = config.get("systemNodeVmSize") ?? "Standard_D4s_v5";
const systemNodeCount = config.getNumber("systemNodeCount") ?? 2;
const enableExternalSecrets = config.getBoolean("enableExternalSecrets") ?? true;
const enableIngressController = config.getBoolean("enableIngressController") ?? true;
const location = new pulumi.Config("azure-native").require("location");

const landingZone = new pulumi.StackReference(landingZoneStackName);
const resourceGroupName = landingZone.requireOutput("resourceGroupName") as pulumi.Output<string>;
const subnetId = landingZone.requireOutput("clusterSubnetId") as pulumi.Output<string>;
const deployerPrincipalId = landingZone.requireOutput("deployerPrincipalId") as pulumi.Output<string>;
const keyVaultName = landingZone.requireOutput("secretsStore") as pulumi.Output<string>;

const clusterName = `${pulumi.getStack()}-aks`;

const cluster = new Cluster("platform", {
    clusterName,
    resourceGroupName,
    location,
    subnetId,
    deployerPrincipalId,
    secretsKeyVaultName: keyVaultName,
    version: clusterVersion,
    systemNodeCount,
    systemNodeVmSize,
    enableExternalSecrets,
    enableIngressController,
    externalSecretsChartVersion: "2.3.0",
    albControllerChartVersion: "1.7.6",
    tags: {
        environment: pulumi.getStack(),
        "solution-family": "kubernetes",
        cloud: "azure",
        language: "typescript",
    },
});

export const kubeconfig = cluster.kubeconfig;
export const clusterNameOut = clusterName;
export const clusterEndpoint = cluster.clusterEndpoint;
export const clusterCertificateAuthority = cluster.clusterCertificateAuthority;
export const oidcIssuerUrl = cluster.oidcIssuerUrl;
export const externalSecretsIdentityClientId = cluster.externalSecretsIdentityClientId;
export const ingressControllerIdentityClientId = cluster.ingressControllerIdentityClientId;
export const trafficControllerId = cluster.trafficControllerId;
export const escEnvironment = `${pulumi.getStack()}-aks`;
import pulumi
from components import Cluster, ClusterArgs

config = pulumi.Config()
landing_zone_stack_name = config.require("landingZoneStack")
cluster_version = config.get("clusterVersion") or "1.33"
system_node_vm_size = config.get("systemNodeVmSize") or "Standard_D4s_v5"
system_node_count = config.get_int("systemNodeCount") or 2
enable_external_secrets = config.get_bool("enableExternalSecrets")
if enable_external_secrets is None:
    enable_external_secrets = True
enable_ingress_controller = config.get_bool("enableIngressController")
if enable_ingress_controller is None:
    enable_ingress_controller = True
location = pulumi.Config("azure-native").require("location")

landing_zone = pulumi.StackReference(landing_zone_stack_name)
resource_group_name = landing_zone.require_output("resourceGroupName")
subnet_id = landing_zone.require_output("clusterSubnetId")
deployer_principal_id = landing_zone.require_output("deployerPrincipalId")
key_vault_name = landing_zone.require_output("secretsStore")

cluster_name = f"{pulumi.get_stack()}-aks"

cluster = Cluster(
    "platform",
    ClusterArgs(
        cluster_name=cluster_name,
        resource_group_name=resource_group_name,
        location=location,
        subnet_id=subnet_id,
        deployer_principal_id=deployer_principal_id,
        secrets_key_vault_name=key_vault_name,
        version=cluster_version,
        system_node_count=system_node_count,
        system_node_vm_size=system_node_vm_size,
        enable_external_secrets=enable_external_secrets,
        enable_ingress_controller=enable_ingress_controller,
        external_secrets_chart_version="2.3.0",
        alb_controller_chart_version="1.7.6",
        tags={
            "environment": pulumi.get_stack(),
            "solution-family": "kubernetes",
            "cloud": "azure",
            "language": "python",
        },
    ),
)

pulumi.export("kubeconfig", cluster.kubeconfig)
pulumi.export("clusterName", cluster_name)
pulumi.export("clusterEndpoint", cluster.cluster_endpoint)
pulumi.export("clusterCertificateAuthority", cluster.cluster_certificate_authority)
pulumi.export("oidcIssuerUrl", cluster.oidc_issuer_url)
pulumi.export("externalSecretsIdentityClientId", cluster.external_secrets_identity_client_id)
pulumi.export("ingressControllerIdentityClientId", cluster.ingress_controller_identity_client_id)
pulumi.export("trafficControllerId", cluster.traffic_controller_id)
pulumi.export("escEnvironment", f"{pulumi.get_stack()}-aks")
package main

import (
	"fmt"

	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi/config"

	"kubernetes-azure/cluster"
)

func main() {
	pulumi.Run(Program)
}

func Program(ctx *pulumi.Context) error {
	cfg := config.New(ctx, "")
	landingZoneStackName := cfg.Require("landingZoneStack")
	clusterVersion := cfg.Get("clusterVersion")
	if clusterVersion == "" {
		clusterVersion = "1.33"
	}
	systemNodeVmSize := cfg.Get("systemNodeVmSize")
	if systemNodeVmSize == "" {
		systemNodeVmSize = "Standard_D4s_v5"
	}
	systemNodeCount := cfg.GetInt("systemNodeCount")
	if systemNodeCount == 0 {
		systemNodeCount = 2
	}
	enableExternalSecrets := true
	if v, err := cfg.TryBool("enableExternalSecrets"); err == nil {
		enableExternalSecrets = v
	}
	enableIngressController := true
	if v, err := cfg.TryBool("enableIngressController"); err == nil {
		enableIngressController = v
	}
	location := config.New(ctx, "azure-native").Require("location")

	landingZone, err := pulumi.NewStackReference(ctx, landingZoneStackName, nil)
	if err != nil {
		return err
	}

	resourceGroupName := landingZone.GetStringOutput(pulumi.String("resourceGroupName"))
	subnetId := landingZone.GetStringOutput(pulumi.String("clusterSubnetId"))
	deployerPrincipalId := landingZone.GetStringOutput(pulumi.String("deployerPrincipalId"))
	keyVaultName := landingZone.GetStringOutput(pulumi.String("secretsStore"))

	clusterName := fmt.Sprintf("%s-aks", ctx.Stack())

	c, err := cluster.New(ctx, "platform", &cluster.Args{
		ClusterName:                 pulumi.String(clusterName),
		ResourceGroupName:           resourceGroupName,
		Location:                    pulumi.String(location),
		SubnetId:                    subnetId,
		DeployerPrincipalId:         deployerPrincipalId,
		SecretsKeyVaultName:         keyVaultName,
		Version:                     pulumi.String(clusterVersion),
		SystemNodeCount:             pulumi.Int(systemNodeCount),
		SystemNodeVmSize:            pulumi.String(systemNodeVmSize),
		EnableExternalSecrets:       enableExternalSecrets,
		EnableIngressController:     enableIngressController,
		ExternalSecretsChartVersion: "2.3.0",
		AlbControllerChartVersion:   "1.7.6",
		Tags: pulumi.StringMap{
			"environment":     pulumi.String(ctx.Stack()),
			"solution-family": pulumi.String("kubernetes"),
			"cloud":           pulumi.String("azure"),
			"language":        pulumi.String("go"),
		},
	})
	if err != nil {
		return err
	}

	ctx.Export("kubeconfig", c.Kubeconfig)
	ctx.Export("clusterName", pulumi.String(clusterName))
	ctx.Export("clusterEndpoint", c.ClusterEndpoint)
	ctx.Export("clusterCertificateAuthority", c.ClusterCertificateAuthority)
	ctx.Export("oidcIssuerUrl", c.OidcIssuerUrl)
	ctx.Export("externalSecretsIdentityClientId", c.ExternalSecretsIdentityClientId)
	ctx.Export("ingressControllerIdentityClientId", c.IngressControllerIdentityClientId)
	ctx.Export("trafficControllerId", c.TrafficControllerId)
	ctx.Export("escEnvironment", pulumi.Sprintf("%s-aks", ctx.Stack()))
	return nil
}

Reusable components

The cluster wiring and add-on installs live in a reusable module so you can import it from other Pulumi projects or adapt it per team.

components/cluster.ts

Provisions the AKS cluster, a system node pool sized for the controllers, workload-identity wiring (Azure AD Workload Identity), and the Helm releases for External Secrets Operator and the ingress controller for this cloud.

import * as pulumi from "@pulumi/pulumi";
import * as azure from "@pulumi/azure-native";
import * as k8s from "@pulumi/kubernetes";

export interface ClusterArgs {
    clusterName: pulumi.Input<string>;
    resourceGroupName: pulumi.Input<string>;
    location: pulumi.Input<string>;
    subnetId: pulumi.Input<string>;
    deployerPrincipalId: pulumi.Input<string>;
    secretsKeyVaultName: pulumi.Input<string>;
    version: pulumi.Input<string>;
    systemNodeCount: pulumi.Input<number>;
    systemNodeVmSize: pulumi.Input<string>;
    enableExternalSecrets: boolean;
    enableIngressController: boolean;
    externalSecretsChartVersion: string;
    albControllerChartVersion: string;
    tags?: pulumi.Input<{ [key: string]: pulumi.Input<string> }>;
}

/**
 * Opinionated AKS cluster wired to the landing-zone VNet. Enables OIDC issuer,
 * Workload Identity, Node Auto Provisioning, and installs External Secrets Operator
 * + the Application Load Balancer Controller for Gateway API routing through AGC.
 */
export class Cluster extends pulumi.ComponentResource {
    public readonly managedCluster: azure.containerservice.ManagedCluster;
    public readonly kubeconfig: pulumi.Output<string>;
    public readonly clusterEndpoint: pulumi.Output<string>;
    public readonly clusterCertificateAuthority: pulumi.Output<string>;
    public readonly oidcIssuerUrl: pulumi.Output<string>;
    public readonly tenantId: pulumi.Output<string>;
    public readonly externalSecretsIdentityClientId: pulumi.Output<string>;
    public readonly ingressControllerIdentityClientId: pulumi.Output<string>;
    public readonly trafficControllerId: pulumi.Output<string>;

    constructor(name: string, args: ClusterArgs, opts?: pulumi.ComponentResourceOptions) {
        super("kubernetes:azure:Cluster", name, {}, opts);
        const tags = args.tags ?? {};
        const client = azure.authorization.getClientConfigOutput({ parent: this });

        // AKS cluster: OIDC + Workload Identity + Node Auto Provisioning, Azure CNI overlay + Cilium.
        const cluster = new azure.containerservice.ManagedCluster(
            `${name}-aks`,
            {
                resourceName: args.clusterName,
                resourceGroupName: args.resourceGroupName,
                location: args.location,
                dnsPrefix: args.clusterName,
                kubernetesVersion: args.version,
                sku: { name: "Base", tier: "Standard" },
                identity: { type: azure.containerservice.ResourceIdentityType.SystemAssigned },
                oidcIssuerProfile: { enabled: true },
                securityProfile: {
                    workloadIdentity: { enabled: true },
                },
                nodeProvisioningProfile: { mode: "Auto" },
                networkProfile: {
                    networkPlugin: "azure",
                    networkPluginMode: "overlay",
                    networkDataplane: "cilium",
                    networkPolicy: "cilium",
                    serviceCidr: "10.100.0.0/16",
                    dnsServiceIP: "10.100.0.10",
                },
                agentPoolProfiles: [
                    {
                        name: "system",
                        mode: "System",
                        count: args.systemNodeCount,
                        vmSize: args.systemNodeVmSize,
                        osType: "Linux",
                        osSKU: "AzureLinux",
                        vnetSubnetID: args.subnetId,
                        type: azure.containerservice.AgentPoolType.VirtualMachineScaleSets,
                    },
                ],
                aadProfile: {
                    managed: true,
                    enableAzureRBAC: true,
                    tenantID: client.tenantId,
                    adminGroupObjectIDs: [],
                },
                tags,
            },
            { parent: this },
        );

        // Grant the landing-zone deployer Cluster Admin on the control plane so kubectl works.
        const deployerRole = "b1ff04bb-8a4e-4dc4-8eb5-8693973ce19b"; // Azure Kubernetes Service RBAC Cluster Admin
        new azure.authorization.RoleAssignment(
            `${name}-deployer-admin`,
            {
                principalId: args.deployerPrincipalId,
                principalType: "ServicePrincipal",
                roleDefinitionId: pulumi.interpolate`/subscriptions/${client.subscriptionId}/providers/Microsoft.Authorization/roleDefinitions/${deployerRole}`,
                scope: cluster.id,
            },
            { parent: this },
        );

        // kubeconfig (admin) - the blueprint uses the user kubeconfig scoped through AAD RBAC.
        const adminCreds = azure.containerservice.listManagedClusterUserCredentialsOutput(
            { resourceGroupName: args.resourceGroupName, resourceName: cluster.name },
            { parent: this },
        );
        const kubeconfig = adminCreds.kubeconfigs.apply((configs) =>
            Buffer.from(configs![0].value, "base64").toString("utf-8"),
        );
        const k8sProvider = new k8s.Provider(
            `${name}-k8s`,
            { kubeconfig, enableServerSideApply: true },
            { parent: this },
        );

        // Helper: federate a KSA to a user-assigned managed identity.
        const federate = (
            id: string,
            namespace: string,
            serviceAccount: string,
        ): { identity: azure.managedidentity.UserAssignedIdentity; clientId: pulumi.Output<string> } => {
            const identity = new azure.managedidentity.UserAssignedIdentity(
                `${name}-${id}-id`,
                {
                    resourceGroupName: args.resourceGroupName,
                    location: args.location,
                    tags,
                },
                { parent: this },
            );
            new azure.managedidentity.FederatedIdentityCredential(
                `${name}-${id}-fed`,
                {
                    resourceGroupName: args.resourceGroupName,
                    resourceName: identity.name,
                    issuer: cluster.oidcIssuerProfile.apply((p) => p!.issuerURL!),
                    subject: `system:serviceaccount:${namespace}:${serviceAccount}`,
                    audiences: ["api://AzureADTokenExchange"],
                },
                { parent: this },
            );
            return { identity, clientId: identity.clientId };
        };

        // Namespaces with Pod Security Admission enforcement (restricted).
        const pssLabels = {
            "pod-security.kubernetes.io/enforce": "restricted",
            "pod-security.kubernetes.io/enforce-version": "latest",
            "pod-security.kubernetes.io/audit": "restricted",
            "pod-security.kubernetes.io/warn": "restricted",
        };

        const esoNamespace = new k8s.core.v1.Namespace(
            `${name}-eso-ns`,
            {
                metadata: { name: "external-secrets", labels: pssLabels },
            },
            { provider: k8sProvider, parent: this },
        );
        const ingressNamespace = new k8s.core.v1.Namespace(
            `${name}-ingress-ns`,
            {
                metadata: { name: "azure-alb-system", labels: pssLabels },
            },
            { provider: k8sProvider, parent: this },
        );

        // External Secrets Operator identity + Key Vault role assignment.
        const esoFederation = federate("eso", "external-secrets", "external-secrets");
        const keyVaultSecretsUser = "4633458b-17de-408a-b874-0445c86b69e6";
        new azure.authorization.RoleAssignment(
            `${name}-eso-kv-access`,
            {
                principalId: esoFederation.identity.principalId,
                principalType: "ServicePrincipal",
                roleDefinitionId: pulumi.interpolate`/subscriptions/${client.subscriptionId}/providers/Microsoft.Authorization/roleDefinitions/${keyVaultSecretsUser}`,
                scope: pulumi.interpolate`/subscriptions/${client.subscriptionId}/resourceGroups/${args.resourceGroupName}/providers/Microsoft.KeyVault/vaults/${args.secretsKeyVaultName}`,
            },
            { parent: this },
        );

        if (args.enableExternalSecrets) {
            new k8s.helm.v3.Release(
                `${name}-eso`,
                {
                    name: "external-secrets",
                    chart: "external-secrets",
                    version: args.externalSecretsChartVersion,
                    namespace: esoNamespace.metadata.name,
                    repositoryOpts: { repo: "https://charts.external-secrets.io" },
                    values: {
                        installCRDs: true,
                        serviceAccount: {
                            name: "external-secrets",
                            annotations: {
                                "azure.workload.identity/client-id": esoFederation.clientId,
                            },
                        },
                        podLabels: { "azure.workload.identity/use": "true" },
                    },
                },
                { provider: k8sProvider, parent: this },
            );
        }

        // Ingress path: Application Gateway for Containers (AGC).
        // Provision the managed ApplicationLoadBalancer + Frontend that the in-cluster
        // ALB Controller binds Gateway API resources against.
        const trafficController = new azure.servicenetworking.TrafficControllerInterface(
            `${name}-agc`,
            {
                trafficControllerName: `${args.clusterName}-agc`,
                resourceGroupName: args.resourceGroupName,
                location: args.location,
                tags,
            },
            { parent: this },
        );
        new azure.servicenetworking.FrontendsInterface(
            `${name}-agc-frontend`,
            {
                frontendName: "default",
                trafficControllerName: trafficController.name,
                resourceGroupName: args.resourceGroupName,
                location: args.location,
            },
            { parent: this },
        );
        const agcFederation = federate("alb", "azure-alb-system", "alb-controller-sa");
        const networkContributor = "4d97b98b-1d4f-4787-a291-c67834d212e7";
        new azure.authorization.RoleAssignment(
            `${name}-alb-agc-reader`,
            {
                principalId: agcFederation.identity.principalId,
                principalType: "ServicePrincipal",
                roleDefinitionId: pulumi.interpolate`/subscriptions/${client.subscriptionId}/providers/Microsoft.Authorization/roleDefinitions/${networkContributor}`,
                scope: trafficController.id,
            },
            { parent: this },
        );

        if (args.enableIngressController) {
            new k8s.helm.v3.Release(
                `${name}-alb-controller`,
                {
                    name: "alb-controller",
                    chart: "alb-controller",
                    version: args.albControllerChartVersion,
                    namespace: ingressNamespace.metadata.name,
                    repositoryOpts: { repo: "oci://mcr.microsoft.com/application-lb/charts" },
                    values: {
                        albController: {
                            podIdentity: {
                                clientID: agcFederation.clientId,
                            },
                        },
                    },
                },
                { provider: k8sProvider, parent: this },
            );
        }

        this.managedCluster = cluster;
        this.kubeconfig = kubeconfig;
        this.clusterEndpoint = cluster.fqdn.apply((fqdn) => `https://${fqdn}`);
        this.clusterCertificateAuthority = adminCreds.kubeconfigs.apply((configs) => {
            const raw = Buffer.from(configs![0].value, "base64").toString("utf-8");
            const match = raw.match(/certificate-authority-data:\s*([^\s]+)/);
            return match ? match[1] : "";
        });
        this.oidcIssuerUrl = cluster.oidcIssuerProfile.apply((p) => p!.issuerURL!);
        this.tenantId = client.tenantId;
        this.externalSecretsIdentityClientId = esoFederation.clientId;
        this.ingressControllerIdentityClientId = agcFederation.clientId;
        this.trafficControllerId = trafficController.id;

        this.registerOutputs({
            kubeconfig: this.kubeconfig,
            clusterEndpoint: this.clusterEndpoint,
            oidcIssuerUrl: this.oidcIssuerUrl,
        });
    }
}

components/cluster.py

Provisions the AKS cluster, a system node pool sized for the controllers, workload-identity wiring (Azure AD Workload Identity), and the Helm releases for External Secrets Operator and the ingress controller for this cloud.

from __future__ import annotations

import base64
import re
from dataclasses import dataclass
from typing import Mapping, Optional

import pulumi
import pulumi_azure_native as azure_native
import pulumi_kubernetes as k8s


@dataclass
class ClusterArgs:
    cluster_name: pulumi.Input[str]
    resource_group_name: pulumi.Input[str]
    location: pulumi.Input[str]
    subnet_id: pulumi.Input[str]
    deployer_principal_id: pulumi.Input[str]
    secrets_key_vault_name: pulumi.Input[str]
    version: pulumi.Input[str]
    system_node_count: pulumi.Input[int]
    system_node_vm_size: pulumi.Input[str]
    enable_external_secrets: bool = True
    enable_ingress_controller: bool = True
    external_secrets_chart_version: str = ""
    alb_controller_chart_version: str = ""
    tags: Optional[Mapping[str, str]] = None


class Cluster(pulumi.ComponentResource):
    """Opinionated AKS cluster with OIDC, Workload Identity, Node Auto Provisioning,
    External Secrets Operator, and Application Gateway for Containers ingress."""

    def __init__(
        self,
        name: str,
        args: ClusterArgs,
        opts: Optional[pulumi.ResourceOptions] = None,
    ) -> None:
        super().__init__("kubernetes:azure:Cluster", name, {}, opts)
        tags = dict(args.tags or {})
        child = pulumi.ResourceOptions(parent=self)
        client = azure_native.authorization.get_client_config_output()

        cluster = azure_native.containerservice.ManagedCluster(
            f"{name}-aks",
            resource_name_=args.cluster_name,
            resource_group_name=args.resource_group_name,
            location=args.location,
            dns_prefix=args.cluster_name,
            kubernetes_version=args.version,
            sku=azure_native.containerservice.ManagedClusterSKUArgs(
                name="Base",
                tier="Standard",
            ),
            identity=azure_native.containerservice.ManagedClusterIdentityArgs(
                type=azure_native.containerservice.ResourceIdentityType.SYSTEM_ASSIGNED,
            ),
            oidc_issuer_profile=azure_native.containerservice.ManagedClusterOIDCIssuerProfileArgs(
                enabled=True,
            ),
            security_profile=azure_native.containerservice.ManagedClusterSecurityProfileArgs(
                workload_identity=azure_native.containerservice.ManagedClusterSecurityProfileWorkloadIdentityArgs(
                    enabled=True,
                ),
            ),
            node_provisioning_profile=azure_native.containerservice.ManagedClusterNodeProvisioningProfileArgs(
                mode="Auto",
            ),
            network_profile=azure_native.containerservice.ContainerServiceNetworkProfileArgs(
                network_plugin="azure",
                network_plugin_mode="overlay",
                network_dataplane="cilium",
                network_policy="cilium",
                service_cidr="10.100.0.0/16",
                dns_service_ip="10.100.0.10",
            ),
            agent_pool_profiles=[
                azure_native.containerservice.ManagedClusterAgentPoolProfileArgs(
                    name="system",
                    mode="System",
                    count=args.system_node_count,
                    vm_size=args.system_node_vm_size,
                    os_type="Linux",
                    os_sku="AzureLinux",
                    vnet_subnet_id=args.subnet_id,
                    type=azure_native.containerservice.AgentPoolType.VIRTUAL_MACHINE_SCALE_SETS,
                ),
            ],
            aad_profile=azure_native.containerservice.ManagedClusterAADProfileArgs(
                managed=True,
                enable_azure_rbac=True,
                tenant_id=client.tenant_id,
                admin_group_object_ids=[],
            ),
            tags=tags,
            opts=child,
        )

        deployer_role = "b1ff04bb-8a4e-4dc4-8eb5-8693973ce19b"  # AKS RBAC Cluster Admin
        azure_native.authorization.RoleAssignment(
            f"{name}-deployer-admin",
            principal_id=args.deployer_principal_id,
            principal_type="ServicePrincipal",
            role_definition_id=pulumi.Output.concat(
                "/subscriptions/", client.subscription_id,
                "/providers/Microsoft.Authorization/roleDefinitions/", deployer_role,
            ),
            scope=cluster.id,
            opts=child,
        )

        admin_creds = azure_native.containerservice.list_managed_cluster_user_credentials_output(
            resource_group_name=args.resource_group_name,
            resource_name=cluster.name,
            # invoke uses positional-style keyword; the list invoke expects `resource_name`
        )

        def _first_kubeconfig(configs):
            return base64.b64decode(configs[0]["value"]).decode("utf-8")

        kubeconfig = admin_creds.kubeconfigs.apply(_first_kubeconfig)
        k8s_provider = k8s.Provider(
            f"{name}-k8s",
            kubeconfig=kubeconfig,
            enable_server_side_apply=True,
            opts=child,
        )
        k8s_opts = pulumi.ResourceOptions(parent=self, provider=k8s_provider)

        def federate(key: str, namespace: str, sa: str):
            identity = azure_native.managedidentity.UserAssignedIdentity(
                f"{name}-{key}-id",
                resource_group_name=args.resource_group_name,
                location=args.location,
                tags=tags,
                opts=child,
            )
            azure_native.managedidentity.FederatedIdentityCredential(
                f"{name}-{key}-fed",
                resource_group_name=args.resource_group_name,
                resource_name_=identity.name,
                issuer=cluster.oidc_issuer_profile.apply(lambda p: p["issuer_url"]),
                subject=f"system:serviceaccount:{namespace}:{sa}",
                audiences=["api://AzureADTokenExchange"],
                opts=child,
            )
            return identity

        psa = {
            "pod-security.kubernetes.io/enforce": "restricted",
            "pod-security.kubernetes.io/enforce-version": "latest",
            "pod-security.kubernetes.io/audit": "restricted",
            "pod-security.kubernetes.io/warn": "restricted",
        }

        eso_ns = k8s.core.v1.Namespace(
            f"{name}-eso-ns",
            metadata=k8s.meta.v1.ObjectMetaArgs(name="external-secrets", labels=psa),
            opts=k8s_opts,
        )
        alb_ns = k8s.core.v1.Namespace(
            f"{name}-ingress-ns",
            metadata=k8s.meta.v1.ObjectMetaArgs(name="azure-alb-system", labels=psa),
            opts=k8s_opts,
        )

        eso_identity = federate("eso", "external-secrets", "external-secrets")
        kv_secrets_user = "4633458b-17de-408a-b874-0445c86b69e6"
        azure_native.authorization.RoleAssignment(
            f"{name}-eso-kv-access",
            principal_id=eso_identity.principal_id,
            principal_type="ServicePrincipal",
            role_definition_id=pulumi.Output.concat(
                "/subscriptions/", client.subscription_id,
                "/providers/Microsoft.Authorization/roleDefinitions/", kv_secrets_user,
            ),
            scope=pulumi.Output.concat(
                "/subscriptions/", client.subscription_id,
                "/resourceGroups/", args.resource_group_name,
                "/providers/Microsoft.KeyVault/vaults/", args.secrets_key_vault_name,
            ),
            opts=child,
        )

        if args.enable_external_secrets:
            k8s.helm.v3.Release(
                f"{name}-eso",
                name="external-secrets",
                chart="external-secrets",
                version=args.external_secrets_chart_version,
                namespace=eso_ns.metadata["name"],
                repository_opts=k8s.helm.v3.RepositoryOptsArgs(
                    repo="https://charts.external-secrets.io",
                ),
                values={
                    "installCRDs": True,
                    "serviceAccount": {
                        "name": "external-secrets",
                        "annotations": {
                            "azure.workload.identity/client-id": eso_identity.client_id,
                        },
                    },
                    "podLabels": {"azure.workload.identity/use": "true"},
                },
                opts=k8s_opts,
            )

        traffic = azure_native.servicenetworking.TrafficControllerInterface(
            f"{name}-agc",
            traffic_controller_name=pulumi.Output.concat(args.cluster_name, "-agc"),
            resource_group_name=args.resource_group_name,
            location=args.location,
            tags=tags,
            opts=child,
        )
        azure_native.servicenetworking.FrontendsInterface(
            f"{name}-agc-frontend",
            frontend_name="default",
            traffic_controller_name=traffic.name,
            resource_group_name=args.resource_group_name,
            location=args.location,
            opts=child,
        )
        alb_identity = federate("alb", "azure-alb-system", "alb-controller-sa")
        network_contributor = "4d97b98b-1d4f-4787-a291-c67834d212e7"
        azure_native.authorization.RoleAssignment(
            f"{name}-alb-agc",
            principal_id=alb_identity.principal_id,
            principal_type="ServicePrincipal",
            role_definition_id=pulumi.Output.concat(
                "/subscriptions/", client.subscription_id,
                "/providers/Microsoft.Authorization/roleDefinitions/", network_contributor,
            ),
            scope=traffic.id,
            opts=child,
        )

        if args.enable_ingress_controller:
            k8s.helm.v3.Release(
                f"{name}-alb-controller",
                name="alb-controller",
                chart="alb-controller",
                version=args.alb_controller_chart_version,
                namespace=alb_ns.metadata["name"],
                repository_opts=k8s.helm.v3.RepositoryOptsArgs(
                    repo="oci://mcr.microsoft.com/application-lb/charts",
                ),
                values={
                    "albController": {
                        "podIdentity": {
                            "clientID": alb_identity.client_id,
                        },
                    },
                },
                opts=k8s_opts,
            )

        def _extract_ca(configs):
            raw = base64.b64decode(configs[0]["value"]).decode("utf-8")
            match = re.search(r"certificate-authority-data:\s*([^\s]+)", raw)
            return match.group(1) if match else ""

        self.managed_cluster = cluster
        self.kubeconfig = kubeconfig
        self.cluster_endpoint = cluster.fqdn.apply(lambda fqdn: f"https://{fqdn}")
        self.cluster_certificate_authority = admin_creds.kubeconfigs.apply(_extract_ca)
        self.oidc_issuer_url = cluster.oidc_issuer_profile.apply(lambda p: p["issuer_url"])
        self.tenant_id = client.tenant_id
        self.external_secrets_identity_client_id = eso_identity.client_id
        self.ingress_controller_identity_client_id = alb_identity.client_id
        self.traffic_controller_id = traffic.id

        self.register_outputs(
            {
                "kubeconfig": self.kubeconfig,
                "cluster_endpoint": self.cluster_endpoint,
                "oidc_issuer_url": self.oidc_issuer_url,
            }
        )

cluster/cluster.go

Provisions the AKS cluster, a system node pool sized for the controllers, workload-identity wiring (Azure AD Workload Identity), and the Helm releases for External Secrets Operator and the ingress controller for this cloud.

package cluster

import (
	"encoding/base64"
	"fmt"
	"regexp"

	authorization "github.com/pulumi/pulumi-azure-native-sdk/authorization/v3"
	containerservice "github.com/pulumi/pulumi-azure-native-sdk/containerservice/v3"
	managedidentity "github.com/pulumi/pulumi-azure-native-sdk/managedidentity/v3"
	servicenetworking "github.com/pulumi/pulumi-azure-native-sdk/servicenetworking/v3"
	helm "github.com/pulumi/pulumi-kubernetes/sdk/v4/go/kubernetes/helm/v3"
	corev1 "github.com/pulumi/pulumi-kubernetes/sdk/v4/go/kubernetes/core/v1"
	metav1 "github.com/pulumi/pulumi-kubernetes/sdk/v4/go/kubernetes/meta/v1"
	k8s "github.com/pulumi/pulumi-kubernetes/sdk/v4/go/kubernetes"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

type Args struct {
	ClusterName                 pulumi.StringInput
	ResourceGroupName           pulumi.StringInput
	Location                    pulumi.StringInput
	SubnetId                    pulumi.StringInput
	DeployerPrincipalId         pulumi.StringInput
	SecretsKeyVaultName         pulumi.StringInput
	Version                     pulumi.StringInput
	SystemNodeCount             pulumi.IntInput
	SystemNodeVmSize            pulumi.StringInput
	EnableExternalSecrets       bool
	EnableIngressController     bool
	ExternalSecretsChartVersion string
	AlbControllerChartVersion   string
	Tags                        pulumi.StringMapInput
}

type Cluster struct {
	pulumi.ResourceState

	ManagedCluster                     *containerservice.ManagedCluster
	Kubeconfig                         pulumi.StringOutput
	ClusterEndpoint                    pulumi.StringOutput
	ClusterCertificateAuthority        pulumi.StringOutput
	OidcIssuerUrl                      pulumi.StringOutput
	TenantId                           pulumi.StringOutput
	ExternalSecretsIdentityClientId    pulumi.StringOutput
	IngressControllerIdentityClientId  pulumi.StringOutput
	TrafficControllerId                pulumi.IDOutput
}

func New(ctx *pulumi.Context, name string, args *Args, opts ...pulumi.ResourceOption) (*Cluster, error) {
	c := &Cluster{}
	if err := ctx.RegisterComponentResource("kubernetes:azure:Cluster", name, c, opts...); err != nil {
		return nil, err
	}
	parent := pulumi.Parent(c)

	client := authorization.GetClientConfigOutput(ctx, parent)

	cluster, err := containerservice.NewManagedCluster(ctx, fmt.Sprintf("%s-aks", name), &containerservice.ManagedClusterArgs{
		ResourceName:      args.ClusterName,
		ResourceGroupName: args.ResourceGroupName,
		Location:          args.Location,
		DnsPrefix:         args.ClusterName,
		KubernetesVersion: args.Version,
		Sku: &containerservice.ManagedClusterSKUArgs{
			Name: pulumi.String("Base"),
			Tier: pulumi.String("Standard"),
		},
		Identity: &containerservice.ManagedClusterIdentityArgs{
			Type: containerservice.ResourceIdentityTypeSystemAssigned,
		},
		OidcIssuerProfile: &containerservice.ManagedClusterOIDCIssuerProfileArgs{
			Enabled: pulumi.BoolPtr(true),
		},
		SecurityProfile: &containerservice.ManagedClusterSecurityProfileArgs{
			WorkloadIdentity: &containerservice.ManagedClusterSecurityProfileWorkloadIdentityArgs{
				Enabled: pulumi.BoolPtr(true),
			},
		},
		NodeProvisioningProfile: &containerservice.ManagedClusterNodeProvisioningProfileArgs{
			Mode: pulumi.String("Auto"),
		},
		NetworkProfile: &containerservice.ContainerServiceNetworkProfileArgs{
			NetworkPlugin:      pulumi.String("azure"),
			NetworkPluginMode:  pulumi.String("overlay"),
			NetworkDataplane:   pulumi.String("cilium"),
			NetworkPolicy:      pulumi.String("cilium"),
			ServiceCidr:        pulumi.String("10.100.0.0/16"),
			DnsServiceIP:       pulumi.String("10.100.0.10"),
		},
		AgentPoolProfiles: containerservice.ManagedClusterAgentPoolProfileArray{
			&containerservice.ManagedClusterAgentPoolProfileArgs{
				Name:         pulumi.String("system"),
				Mode:         pulumi.String("System"),
				Count:        args.SystemNodeCount,
				VmSize:       args.SystemNodeVmSize,
				OsType:       pulumi.String("Linux"),
				OsSKU:        pulumi.String("AzureLinux"),
				VnetSubnetID: args.SubnetId,
				Type:         containerservice.AgentPoolTypeVirtualMachineScaleSets,
			},
		},
		AadProfile: &containerservice.ManagedClusterAADProfileArgs{
			Managed:             pulumi.BoolPtr(true),
			EnableAzureRBAC:     pulumi.BoolPtr(true),
			TenantID:            client.TenantId(),
			AdminGroupObjectIDs: pulumi.StringArray{},
		},
		Tags: args.Tags,
	}, parent)
	if err != nil {
		return nil, err
	}

	subscriptionId := client.SubscriptionId()

	deployerRoleDef := subscriptionId.ApplyT(func(sub string) string {
		return fmt.Sprintf("/subscriptions/%s/providers/Microsoft.Authorization/roleDefinitions/b1ff04bb-8a4e-4dc4-8eb5-8693973ce19b", sub)
	}).(pulumi.StringOutput)
	if _, err := authorization.NewRoleAssignment(ctx, fmt.Sprintf("%s-deployer-admin", name), &authorization.RoleAssignmentArgs{
		PrincipalId:      args.DeployerPrincipalId,
		PrincipalType:    pulumi.String("ServicePrincipal"),
		RoleDefinitionId: deployerRoleDef,
		Scope:            cluster.ID().ToStringOutput(),
	}, parent); err != nil {
		return nil, err
	}

	creds := containerservice.ListManagedClusterUserCredentialsOutput(ctx, containerservice.ListManagedClusterUserCredentialsOutputArgs{
		ResourceGroupName: args.ResourceGroupName,
		ResourceName:      cluster.Name,
	}, parent)
	kubeconfig := creds.Kubeconfigs().Index(pulumi.Int(0)).Value().ApplyT(func(raw string) (string, error) {
		if raw == "" {
			return "", fmt.Errorf("no kubeconfig returned")
		}
		decoded, err := base64.StdEncoding.DecodeString(raw)
		if err != nil {
			return "", err
		}
		return string(decoded), nil
	}).(pulumi.StringOutput)

	k8sProvider, err := k8s.NewProvider(ctx, fmt.Sprintf("%s-k8s", name), &k8s.ProviderArgs{
		Kubeconfig:            kubeconfig,
		EnableServerSideApply: pulumi.BoolPtr(true),
	}, parent)
	if err != nil {
		return nil, err
	}
	k8sOpts := append([]pulumi.ResourceOption{pulumi.Provider(k8sProvider)}, parent)

	federate := func(key, namespace, sa string) (*managedidentity.UserAssignedIdentity, error) {
		identity, err := managedidentity.NewUserAssignedIdentity(ctx, fmt.Sprintf("%s-%s-id", name, key), &managedidentity.UserAssignedIdentityArgs{
			ResourceGroupName: args.ResourceGroupName,
			Location:          args.Location,
			Tags:              args.Tags,
		}, parent)
		if err != nil {
			return nil, err
		}
		subject := fmt.Sprintf("system:serviceaccount:%s:%s", namespace, sa)
		if _, err := managedidentity.NewFederatedIdentityCredential(ctx, fmt.Sprintf("%s-%s-fed", name, key), &managedidentity.FederatedIdentityCredentialArgs{
			ResourceGroupName: args.ResourceGroupName,
			ResourceName:      identity.Name,
			Issuer: cluster.OidcIssuerProfile.ApplyT(func(p *containerservice.ManagedClusterOIDCIssuerProfileResponse) string {
				if p == nil {
					return ""
				}
				return p.IssuerURL
			}).(pulumi.StringOutput),
			Subject:   pulumi.String(subject),
			Audiences: pulumi.StringArray{pulumi.String("api://AzureADTokenExchange")},
		}, parent); err != nil {
			return nil, err
		}
		return identity, nil
	}

	psa := pulumi.StringMap{
		"pod-security.kubernetes.io/enforce":         pulumi.String("restricted"),
		"pod-security.kubernetes.io/enforce-version": pulumi.String("latest"),
		"pod-security.kubernetes.io/audit":           pulumi.String("restricted"),
		"pod-security.kubernetes.io/warn":            pulumi.String("restricted"),
	}
	esoNs, err := corev1.NewNamespace(ctx, fmt.Sprintf("%s-eso-ns", name), &corev1.NamespaceArgs{
		Metadata: &metav1.ObjectMetaArgs{Name: pulumi.String("external-secrets"), Labels: psa},
	}, k8sOpts...)
	if err != nil {
		return nil, err
	}
	albNs, err := corev1.NewNamespace(ctx, fmt.Sprintf("%s-ingress-ns", name), &corev1.NamespaceArgs{
		Metadata: &metav1.ObjectMetaArgs{Name: pulumi.String("azure-alb-system"), Labels: psa},
	}, k8sOpts...)
	if err != nil {
		return nil, err
	}

	esoIdentity, err := federate("eso", "external-secrets", "external-secrets")
	if err != nil {
		return nil, err
	}
	kvSecretsUser := subscriptionId.ApplyT(func(sub string) string {
		return fmt.Sprintf("/subscriptions/%s/providers/Microsoft.Authorization/roleDefinitions/4633458b-17de-408a-b874-0445c86b69e6", sub)
	}).(pulumi.StringOutput)
	kvScope := pulumi.All(subscriptionId, args.ResourceGroupName, args.SecretsKeyVaultName).ApplyT(func(parts []interface{}) string {
		return fmt.Sprintf("/subscriptions/%s/resourceGroups/%s/providers/Microsoft.KeyVault/vaults/%s", parts[0], parts[1], parts[2])
	}).(pulumi.StringOutput)
	if _, err := authorization.NewRoleAssignment(ctx, fmt.Sprintf("%s-eso-kv-access", name), &authorization.RoleAssignmentArgs{
		PrincipalId:      esoIdentity.PrincipalId,
		PrincipalType:    pulumi.String("ServicePrincipal"),
		RoleDefinitionId: kvSecretsUser,
		Scope:            kvScope,
	}, parent); err != nil {
		return nil, err
	}

	if args.EnableExternalSecrets {
		if _, err := helm.NewRelease(ctx, fmt.Sprintf("%s-eso", name), &helm.ReleaseArgs{
			Name:      pulumi.String("external-secrets"),
			Chart:     pulumi.String("external-secrets"),
			Version:   pulumi.String(args.ExternalSecretsChartVersion),
			Namespace: esoNs.Metadata.Name(),
			RepositoryOpts: &helm.RepositoryOptsArgs{
				Repo: pulumi.String("https://charts.external-secrets.io"),
			},
			Values: pulumi.Map{
				"installCRDs": pulumi.Bool(true),
				"serviceAccount": pulumi.Map{
					"name": pulumi.String("external-secrets"),
					"annotations": pulumi.Map{
						"azure.workload.identity/client-id": esoIdentity.ClientId,
					},
				},
				"podLabels": pulumi.Map{"azure.workload.identity/use": pulumi.String("true")},
			},
		}, k8sOpts...); err != nil {
			return nil, err
		}
	}

	trafficName := args.ClusterName.ToStringOutput().ApplyT(func(v string) string { return v + "-agc" }).(pulumi.StringOutput)
	traffic, err := servicenetworking.NewTrafficControllerInterface(ctx, fmt.Sprintf("%s-agc", name), &servicenetworking.TrafficControllerInterfaceArgs{
		TrafficControllerName: trafficName,
		ResourceGroupName:     args.ResourceGroupName,
		Location:              args.Location,
		Tags:                  args.Tags,
	}, parent)
	if err != nil {
		return nil, err
	}
	if _, err := servicenetworking.NewFrontendsInterface(ctx, fmt.Sprintf("%s-agc-frontend", name), &servicenetworking.FrontendsInterfaceArgs{
		FrontendName:          pulumi.String("default"),
		TrafficControllerName: traffic.Name,
		ResourceGroupName:     args.ResourceGroupName,
		Location:              args.Location,
	}, parent); err != nil {
		return nil, err
	}
	albIdentity, err := federate("alb", "azure-alb-system", "alb-controller-sa")
	if err != nil {
		return nil, err
	}
	networkContributor := subscriptionId.ApplyT(func(sub string) string {
		return fmt.Sprintf("/subscriptions/%s/providers/Microsoft.Authorization/roleDefinitions/4d97b98b-1d4f-4787-a291-c67834d212e7", sub)
	}).(pulumi.StringOutput)
	if _, err := authorization.NewRoleAssignment(ctx, fmt.Sprintf("%s-alb-agc", name), &authorization.RoleAssignmentArgs{
		PrincipalId:      albIdentity.PrincipalId,
		PrincipalType:    pulumi.String("ServicePrincipal"),
		RoleDefinitionId: networkContributor,
		Scope:            traffic.ID().ToStringOutput(),
	}, parent); err != nil {
		return nil, err
	}

	if args.EnableIngressController {
		if _, err := helm.NewRelease(ctx, fmt.Sprintf("%s-alb-controller", name), &helm.ReleaseArgs{
			Name:      pulumi.String("alb-controller"),
			Chart:     pulumi.String("alb-controller"),
			Version:   pulumi.String(args.AlbControllerChartVersion),
			Namespace: albNs.Metadata.Name(),
			RepositoryOpts: &helm.RepositoryOptsArgs{
				Repo: pulumi.String("oci://mcr.microsoft.com/application-lb/charts"),
			},
			Values: pulumi.Map{
				"albController": pulumi.Map{
					"podIdentity": pulumi.Map{
						"clientID": albIdentity.ClientId,
					},
				},
			},
		}, k8sOpts...); err != nil {
			return nil, err
		}
	}

	caRe := regexp.MustCompile(`certificate-authority-data:\s*(\S+)`)
	ca := kubeconfig.ApplyT(func(raw string) string {
		m := caRe.FindStringSubmatch(raw)
		if len(m) > 1 {
			return m[1]
		}
		return ""
	}).(pulumi.StringOutput)

	c.ManagedCluster = cluster
	c.Kubeconfig = kubeconfig
	c.ClusterEndpoint = cluster.Fqdn.ApplyT(func(fqdn string) string {
		if fqdn == "" {
			return ""
		}
		return "https://" + fqdn
	}).(pulumi.StringOutput)
	c.ClusterCertificateAuthority = ca
	c.OidcIssuerUrl = cluster.OidcIssuerProfile.ApplyT(func(p *containerservice.ManagedClusterOIDCIssuerProfileResponse) string {
		if p == nil {
			return ""
		}
		return p.IssuerURL
	}).(pulumi.StringOutput)
	c.TenantId = client.TenantId()
	c.ExternalSecretsIdentityClientId = esoIdentity.ClientId
	c.IngressControllerIdentityClientId = albIdentity.ClientId
	c.TrafficControllerId = traffic.ID()

	if err := ctx.RegisterResourceOutputs(c, pulumi.Map{
		"kubeconfig":       c.Kubeconfig,
		"clusterEndpoint":  c.ClusterEndpoint,
		"oidcIssuerUrl":    c.OidcIssuerUrl,
	}); err != nil {
		return nil, err
	}
	return c, nil
}

Frequently asked questions

Do I need the Pulumi landing-zone stack first?
Yes. The blueprint consumes landing-zone outputs (network ids, key ids, deployer identity) through a StackReference. Deploy the landing-zone family in the same cloud account first, then point this stack at it with pulumi config set landingZoneStack <your-org>/landing-zone/dev. If you want to bring your own network, replace the StackReference block in the entrypoint with the ids you already have.
Which add-ons does this blueprint install?
External Secrets Operator (for syncing cloud-native secret stores into the cluster), a cloud-native Layer-7 ingress path (AWS Load Balancer Controller on EKS, Application Gateway for Containers on AKS, GKE Gateway API on GKE), and a cloud-native node autoscaler (Karpenter on EKS, Node Auto Provisioning on AKS and GKE). All are installed through pinned Helm charts or managed-cluster config. Each add-on has a config flag so you can disable any of them on pulumi up.
How does workload identity work here?
On AWS the blueprint creates IRSA (IAM Roles for Service Accounts) roles scoped per service account and binds them through OIDC federation. On AKS it turns on Workload Identity + OIDC and wires FederatedIdentityCredential resources per service account. On GKE it enables Workload Identity Federation on the cluster and annotates each controller’s service account so it maps to a scoped Google service account. In every case pods call cloud APIs with short-lived tokens, never with static credentials.
How do I consume the cluster from another Pulumi project?
The stack exports kubeconfig, clusterName, clusterEndpoint, and clusterOidcIssuerUrl plus an escEnvironment name. Downstream workload stacks either import the Pulumi ESC environment this stack attaches to, or use a StackReference to pull those outputs. The Consume the cluster section shows both patterns with TypeScript, Python, and Go examples.
How do I upgrade Kubernetes versions later?
Bump the clusterVersion config value and run pulumi up. EKS and AKS upgrade the managed control plane in place; GKE follows the release channel you selected. Node pools refresh behind the same config value - Karpenter rolls AMIs per NodeClass, AKS NAP rolls through its AKSNodeClass, and GKE Node Auto Provisioning rolls through its NodePool templates.
What does this cost?
The control plane is a per-hour charge on every cloud even when no workloads are running. Add the system node pool, any network egress from the landing-zone network, and the Layer-7 data-plane service when you start creating Ingress / Gateway / HTTPRoute resources. This blueprint does not deploy application workloads, so the baseline is the control plane plus the system node pool.