1. Docs
  2. Clouds
  3. Kubernetes
  4. Guides
  5. Cluster services

Kubernetes cluster services

    Cluster services are general services scoped at the Kubernetes cluster level. These services tend to include logging and monitoring at a minimum for the whole cluster, or a subset of apps and workloads. It could also include policy enforcement and service meshes.

    The full code for the AWS cluster services is on GitHub.

    The full code for the Azure cluster services is on GitHub.

    GKE logging and monitoring is managed by Google Cloud through StackDriver.

    The repo for the Google Cloud cluster services is on GitHub, but it is empty since no extra steps are required after cluster and Node Pool creation in the Cluster Configuration stack.

    The full code for the general cluster services is on GitHub.

    Overview

    We’ll explore how to setup:

    See the official AWS docs for more details.

    Prerequisites

    Authenticate as the admins role from the Identity stack.

    $ aws sts assume-role --role-arn `pulumi stack output adminsIamRoleArn` --role-session-name k8s-admin
    $ export KUBECONFIG=`pwd`/kubeconfig-admin.json
    

    AWS Logging

    Control Plane

    In the Recommended Settings of Creating the Control Plane, we enabled cluster logging for the various controllers of the control plane.

    To view these logs, go to the CloudWatch console, navigate to the logs in your region, and look for the following group.

    /aws/eks/Cluster_Name/cluster
    

    The cluster name can be retrieved from the cluster stack output.

    $ pulumi stack output clusterName
    

    Worker Nodes and Pods

    Configure Worker Node IAM Policy

    To work with Cloudwatch Logs, the identities created in Identity for each worker node group must have the proper permissions in IAM.

    Attach the permissions to the IAM role for each nodegroup.

    import * as aws from "@pulumi/aws";
    
    // Parse out the role names e.g. `roleName-123456` from `arn:aws:iam::123456789012:role/roleName-123456`
    const stdNodegroupIamRoleName = config.stdNodegroupIamRoleArn.apply(s => s.split("/")).apply(s => s[1])
    const perfNodegroupIamRoleName = config.perfNodegroupIamRoleArn.apply(s => s.split("/")).apply(s => s[1])
    
    // Create a new IAM Policy for fluentd-cloudwatch to manage CloudWatch Logs.
    const name = "fluentd-cloudwatch";
    const fluentdCloudWatchPolicy = new aws.iam.Policy(name,
        {
            description: "Allows fluentd-cloudwatch to work with CloudWatch Logs.",
            policy: JSON.stringify(
                {
                    Version: "2012-10-17",
                    Statement: [{Effect: "Allow", Action: ["logs:*"], Resource: ["arn:aws:logs:*:*:*"]}]
                }
            )
        },
    );
    
    // Attach CloudWatch Logs policies to a role.
    function attachLogPolicies(name: string, arn: pulumi.Input<aws.ARN>) {
        new aws.iam.RolePolicyAttachment(name,
            { policyArn: fluentdCloudWatchPolicy.arn, role: arn},
        );
    }
    
    attachLogPolicies("stdRpa", stdNodegroupIamRoleName);
    attachLogPolicies("perfRpa", perfNodegroupIamRoleName);
    

    Using the YAML manifests in the AWS samples, we can provision fluentd-cloudwatch to run as a DaemonSet and send worker and app logs to CloudWatch Logs.

    Install fluentd

    Create a Namespace.

    $ kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/cloudwatch-namespace.yaml
    

    Create a ConfigMap.

    $ kubectl create configmap cluster-info --from-literal=cluster.name=`pulumi stack output clusterName` --from-literal=logs.region=`pulumi stack output region` -n amazon-cloudwatch
    

    Deploy the DaemonSet.

    $ kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/fluentd/fluentd.yaml
    

    Validate the deployment.

    $ kubectl get pods -n amazon-cloudwatch
    

    Verify the fluentd setup in the CloudWatch console by navigating to the logs in your region, and looking for the following groups.

    /aws/containerinsights/Cluster_Name/application
    /aws/containerinsights/Cluster_Name/host
    /aws/containerinsights/Cluster_Name/dataplane
    

    The cluster name can be retrieved from the cluster stack output.

    $ pulumi stack output clusterName
    

    Clean Up.

    $ kubectl delete ns amazon-cloudwatch
    

    Using the Helm chart, we can provision fluentd-cloudwatch in Pulumi to run as a DaemonSet and send worker and app logs to CloudWatch Logs.

    Install fluentd

    Deploy the Chart into the cluster-svcs namespace created in Configure Cluster Defaults .

    import * as k8s from "@pulumi/kubernetes";
    
    // Create a new provider to the cluster using the cluster's kubeconfig.
    const provider = new k8s.Provider("provider", {kubeconfig: config.kubeconfig});
    
    // Create a new CloudWatch Log group for fluentd-cloudwatch.
    const fluentdCloudWatchLogGroup = new aws.cloudwatch.LogGroup(name);
    export let fluentdCloudWatchLogGroupName = fluentdCloudWatchLogGroup.name;
    
    // Deploy fluentd-cloudwatch using the Helm chart.
    const fluentdCloudwatch = new k8s.helm.v3.Chart(name,
        {
            namespace: config.clusterSvcsNamespaceName,
            chart: "fluentd-cloudwatch",
            version: "0.11.0",
            fetchOpts: {
                repo: "https://charts.helm.sh/incubator",
            },
            values: {
                extraVars: [ "{ name: FLUENT_UID, value: '0' }" ],
                rbac: {create: true},
                awsRegion: aws.config.region,
                logGroupName: fluentdCloudWatchLogGroup.name,
            },
            transformations: [
                (obj: any) => {
                    // Do transformations on the YAML to set the namespace
                    if (obj.metadata) {
                        obj.metadata.namespace = config.clusterSvcsNamespaceName;
                    }
                },
            ],
        },
        {providers: { kubernetes: provider }},
    );
    

    Validate the deployment.

    $ kubectl get pods -n `pulumi stack output clusterSvcsNamespaceName`
    

    Verify the fluentd setup in the CloudWatch console by navigating to the logs in your region, and looking for the following group.

    $ pulumi stack output fluentdCloudWatchLogGroupName
    

    Note: CloudWatch is rate limited and often times the size of the data being sent can cause ThrottlingException error="Rate exceeded". This can cause a delay in logs showing up in CloudWatch. Request a limit increase, or alter the data being sent, if necessary. See the CloudWatch limits for more details.

    Overview

    We’ll explore how to setup:

    See the official Azure Monitor and AKS docs for more details.

    Azure Logging and Monitoring

    AKS monitoring is managed by Azure through Log Analytics.

    Once enabled, in the Azure portal visit the cluster’s Kubernetes service details, and analyze its Azure Monitor information in the Monitoring section’s: Insights, Logs, and Metrics.

    Enable Azure Monitor for the Cluster

    Enable the Log Analytics agent on the AKS cluster in the Cluster Configuration stack.

    import * as azure from "@pulumi/azure";
    
    // Create the AKS cluster with LogAnalytics enabled in the given workspace.
    const cluster = new azure.containerservice.KubernetesCluster(`${name}`, {
        ...
        resourceGroupName: config.resourceGroupName,
        addonProfile: {
            omsAgent: {
                enabled: true,
                logAnalyticsWorkspaceId: config.logAnalyticsWorkspaceId,
            },
        },
    });
    

    Enable logging for the control plane, and monitoring of all metrics in the Cluster Services stack.

    import * as azure from "@pulumi/azure";
    
    // Enable the Monitoring Diagonostic control plane component logs and AllMetrics
    const azMonitoringDiagnostic = new azure.monitoring.DiagnosticSetting(name, {
        logAnalyticsWorkspaceId: config.logAnalyticsWorkspaceId,
        targetResourceId: config.clusterId,
        logs: ["kube-apiserver", "kube-controller-manager", "kube-scheduler", "kube-audit", "cluster-autoscaler"]
            .map(category => ({
                category,
                enabled : true,
                retentionPolicy: { enabled: true },
            })),
        metrics: [{
            category: "AllMetrics",
            retentionPolicy: { enabled: true },
        }],
    });
    

    Worker Nodes

    To get the Worker kubelet logs you need to SSH into the nodes.

    Use the node admin username and SSH key used in the Cluster Configuration stack.

    import * as azure from "@pulumi/azure";
    
    // Create the AKS cluster with LogAnalytics enabled in the given workspace.
    const cluster = new azure.containerservice.KubernetesCluster(`${name}`, {
        ...
        resourceGroupName: config.resourceGroupName,
        linuxProfile: {
            adminUsername: "aksuser",
            sshKey: {
                keyData: sshPublicKey,
            },
        },
    });
    

    See the official AKS docs for more details.

    Overview

    We’ll explore how to setup:

    See the official GKE and StackDriver Observing docs for more details.

    Google Cloud Logging and Monitoring

    GKE monitoring is managed by Google Cloud through StackDriver.

    Stackdriver Kubernetes Engine Monitoring is the default logging option for GKE clusters, and it comes automatically enabled for all clusters starting with version 1.14.

    Enable the Node Pool

    Enable the cluster’s Node Pool with the proper logging and monitoring permission in the Cluster Configuration stack.

    import * as gcp from "@pulumi/gcp";
    
    // Create a GKE cluster.
    // Versions >= 1.14 have Stackdriver Monitoring enabled by default.
    const cluster = new gcp.container.Cluster(`${name}`, {
        ...
        minMasterVersion: "1.14.7-gke.17",
    }
    
    // Create the GKE Node Pool with OAuth scopes enabled for logging and monitoring.
    const standardNodes = new gcp.container.NodePool("standard-nodes", {
        ...
        cluster: cluster.name,
        version: "1.14.7-gke.17",
        nodeConfig: {
            machineType: "n1-standard-1",
            oauthScopes: [
                "https://www.googleapis.com/auth/compute",
                "https://www.googleapis.com/auth/devstorage.read_only",
                "https://www.googleapis.com/auth/logging.write",
                "https://www.googleapis.com/auth/monitoring",
            ],
        },
    });
    

    AWS Monitoring

    Using the YAML manifests in the AWS samples, we can provision the CloudWatch Agent to run as a DaemonSet and send metrics to CloudWatch.

    Install CloudWatch Agent

    Create a Namespace.

    $ kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/cloudwatch-namespace.yaml
    

    Create a ServiceAccount.

    $ kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/cwagent-kubernetes-monitoring/cwagent-serviceaccount.yaml
    

    Create a ConfigMap.

    $ curl -s https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/cwagent-kubernetes-monitoring/cwagent-configmap.yaml | sed -e "s#{{cluster_name}}#`pulumi stack output clusterName`#g" | kubectl apply -f -
    

    Deploy the DaemonSet.

    $ kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/cwagent-kubernetes-monitoring/cwagent-daemonset.yaml
    

    Validate the deployment.

    $ kubectl get pods -n amazon-cloudwatch
    

    Verify the metrics setup in the CloudWatch console by navigating to Logs in your region, and looking for the following group.

    /aws/containerinsights/Cluster_Name/performance
    

    The cluster name can be retrieved from the cluster stack output.

    $ pulumi stack output clusterName
    

    You can also examine the stats in the CloudWatch console by navigating to Metrics in your region, and looking for the ContainerInsights for your cluster by its name.

    Clean Up.

    $ kubectl delete ns amazon-cloudwatch
    

    Datadog

    Deploy Datadog as a DaemonSet to aggregate Kubernetes, node, and container metrics and events, in addition to provider managed logging and monitoring.

    The full code for this app stack is on GitHub.

    import * as k8s from "@pulumi/kubernetes";
    
    const appName = "datadog";
    const appLabels = { app: appName };
    
    // Create a DataDog DaemonSet.
    const datadog = new k8s.apps.v1.DaemonSet(appName, {
        metadata: { labels: appLabels},
        spec: {
            selector: {
                matchLabels: appLabels,
            },
            template: {
                metadata: { labels: appLabels },
                spec: {
                    containers: [
                        {
                            image: "datadog/agent:latest",
                            name: "nginx",
                            resources: {limits: {memory: "512Mi"}, requests: {memory: "512Mi"}},
                            env: [
                                {
                                    name: "DD_KUBERNETES_KUBELET_HOST",
                                    valueFrom: {
                                        fieldRef: {
                                            fieldPath: "status.hostIP",
                                        },
                                    },
                                },
                                {
                                    name: "DD_API_KEY",
                                    valueFrom: {
                                        configMapKeyRef: {
                                            name: ddConfigMap.metadata.name,
                                            key: "DD_API_KEY",
                                        },
                                    },
                                },
                                {
                                    name: "DD_PROCESS_AGENT_ENABLED",
                                    valueFrom: {
                                        configMapKeyRef: {
                                            name: ddConfigMap.metadata.name,
                                            key: "DD_PROCESS_AGENT_ENABLED",
                                        },
                                    },
                                },
                                {
                                    name: "DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL",
                                    valueFrom: {
                                        configMapKeyRef: {
                                            name: ddConfigMap.metadata.name,
                                            key: "DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL",
                                        },
                                    },
                                },
                                {
                                    name: "DD_COLLECT_KUBERNETES_EVENTS",
                                    valueFrom: {
                                        configMapKeyRef: {
                                            name: ddConfigMap.metadata.name,
                                            key: "DD_COLLECT_KUBERNETES_EVENTS",
                                        },
                                    },
                                },
                                ...
                            ],
                            volumeMounts: [
                                {name: "dockersocket", mountPath: "/var/run/docker.sock"},
                                {name: "proc", mountPath: "/host/proc"},
                                {name: "cgroup", mountPath: "/host/sys/fs/cgroup"},
                            ],
                        },
                    ],
                    volumes: [
                        {name: "dockersocket", hostPath: {path: "/var/run/docker.sock"}},
                        {name: "proc", hostPath: {path: "/proc"}},
                        {name: "cgroup", hostPath: {path: "/sys/fs/cgroup"}},
                    ],
                },
            },
        },
    }, { provider: provider });
    
      Pulumi AI - What cloud infrastructure would you like to build? Generate Program