Deploy Cluster Services

Cluster services are general services scoped at the Kubernetes cluster level. These services tend to include logging and monitoring at a minimum for the whole cluster, or a subset of apps and workloads. It could also include policy enforcement and service meshes.

The full code for the AWS cluster services is on GitHub.

The full code for the Azure cluster services is on GitHub.

GKE logging and monitoring is managed by GCP through StackDriver.

The repo for the GCP cluster services is on GitHub, but it is empty since no extra steps are required after cluster and Node Pool creation in the Cluster Configuration stack.

The full code for the general cluster services is on GitHub.

Overview

We’ll explore how to setup:

See the official AWS docs for more details.

Prerequisites

Authenticate as the admins role from the Identity stack.

$ aws sts assume-role --role-arn `pulumi stack output adminsIamRoleArn` --role-session-name k8s-admin
$ export KUBECONFIG=`pwd`/kubeconfig-admin.json

AWS Logging

Control Plane

In the Recommended Settings of Creating the Control Plane, we enabled cluster logging for the various controllers of the control plane.

To view these logs, go to the CloudWatch console, navigate to the logs in your region, and look for the following group.

/aws/eks/Cluster_Name/cluster

The cluster name can be retrieved from the cluster stack output.

$ pulumi stack output clusterName

Worker Nodes and Pods

Configure Worker Node IAM Policy

To work with Cloudwatch Logs, the identities created in Identity for each worker node group must have the proper permissions in IAM.

Attach the permissions to the IAM role for each nodegroup.

import * as aws from "@pulumi/aws";

// Parse out the role names e.g. `roleName-123456` from `arn:aws:iam::123456789012:role/roleName-123456`
const stdNodegroupIamRoleName = config.stdNodegroupIamRoleArn.apply(s => s.split("/")).apply(s => s[1])
const perfNodegroupIamRoleName = config.perfNodegroupIamRoleArn.apply(s => s.split("/")).apply(s => s[1])

// Create a new IAM Policy for fluentd-cloudwatch to manage CloudWatch Logs.
const name = "fluentd-cloudwatch";
const fluentdCloudWatchPolicy = new aws.iam.Policy(name,
    {
        description: "Allows fluentd-cloudwatch to work with CloudWatch Logs.",
        policy: JSON.stringify(
            {
                Version: "2012-10-17",
                Statement: [{Effect: "Allow", Action: ["logs:*"], Resource: ["arn:aws:logs:*:*:*"]}]
            }
        )
    },
);

// Attach CloudWatch Logs policies to a role.
function attachLogPolicies(name: string, arn: pulumi.Input<aws.ARN>) {
    new aws.iam.RolePolicyAttachment(name,
        { policyArn: fluentdCloudWatchPolicy.arn, role: arn},
    );
}

attachLogPolicies("stdRpa", stdNodegroupIamRoleName);
attachLogPolicies("perfRpa", perfNodegroupIamRoleName);

Using the YAML manifests in the AWS samples, we can provision fluentd-cloudwatch to run as a DaemonSet and send worker and app logs to CloudWatch Logs.

Install fluentd

Create a Namespace.

$ kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/cloudwatch-namespace.yaml

Create a ConfigMap.

$ kubectl create configmap cluster-info --from-literal=cluster.name=`pulumi stack output clusterName` --from-literal=logs.region=`pulumi stack output region` -n amazon-cloudwatch

Deploy the DaemonSet.

$ kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/fluentd/fluentd.yaml

Validate the deployment.

$ kubectl get pods -n amazon-cloudwatch

Verify the fluentd setup in the CloudWatch console by navigating to the logs in your region, and looking for the following groups.

/aws/containerinsights/Cluster_Name/application
/aws/containerinsights/Cluster_Name/host
/aws/containerinsights/Cluster_Name/dataplane

The cluster name can be retrieved from the cluster stack output.

$ pulumi stack output clusterName

Clean Up.

$ kubectl delete ns amazon-cloudwatch

Using the Helm chart, we can provision fluentd-cloudwatch in Pulumi to run as a DaemonSet and send worker and app logs to CloudWatch Logs.

Install fluentd

Deploy the Chart into the cluster-svcs namespace created in Configure Cluster Defaults .

import * as k8s from "@pulumi/kubernetes";

// Create a new provider to the cluster using the cluster's kubeconfig.
const provider = new k8s.Provider("provider", {kubeconfig: config.kubeconfig});

// Create a new CloudWatch Log group for fluentd-cloudwatch.
const fluentdCloudWatchLogGroup = new aws.cloudwatch.LogGroup(name);
export let fluentdCloudWatchLogGroupName = fluentdCloudWatchLogGroup.name;

// Deploy fluentd-cloudwatch using the Helm chart.
const fluentdCloudwatch = new k8s.helm.v2.Chart(name,
    {
        namespace: config.clusterSvcsNamespaceName,
        chart: "fluentd-cloudwatch",
        version: "0.11.0",
        fetchOpts: {
            repo: "https://kubernetes-charts-incubator.storage.googleapis.com/",
        },
        values: {
            extraVars: [ "{ name: FLUENT_UID, value: '0' }" ],
            rbac: {create: true},
            awsRegion: aws.config.region,
            logGroupName: fluentdCloudWatchLogGroup.name,
        },
        transformations: [
            (obj: any) => {
                // Do transformations on the YAML to set the namespace
                if (obj.metadata) {
                    obj.metadata.namespace = config.clusterSvcsNamespaceName;
                }
            },
        ],
    },
    {providers: { kubernetes: provider }},
);

Validate the deployment.

$ kubectl get pods -n `pulumi stack output clusterSvcsNamespaceName`

Verify the fluentd setup in the CloudWatch console by navigating to the logs in your region, and looking for the following group.

$ pulumi stack output fluentdCloudWatchLogGroupName

Note: CloudWatch is rate limited and often times the size of the data being sent can cause ThrottlingException error="Rate exceeded". This can cause a delay in logs showing up in CloudWatch. Request a limit increase, or alter the data being sent, if necessary. See the CloudWatch limits for more details.

Overview

We’ll explore how to setup:

See the official Azure Monitor and AKS docs for more details.

Azure Logging and Monitoring

AKS monitoring is managed by Azure through Log Analytics.

Once enabled, in the Azure portal visit the cluster’s Kubernetes service details, and analyze its Azure Monitor information in the Monitoring section’s: Insights, Logs, and Metrics.

Enable Azure Monitor for the Cluster

Enable the Log Analytics agent on the AKS cluster in the Cluster Configuration stack.

import * as azure from "@pulumi/azure";

// Create the AKS cluster with LogAnalytics enabled in the given workspace.
const cluster = new azure.containerservice.KubernetesCluster(`${name}`, {
    ...
    resourceGroupName: config.resourceGroupName,
    addonProfile: {
        omsAgent: {
            enabled: true,
            logAnalyticsWorkspaceId: config.logAnalyticsWorkspaceId,
        },
    },
});

Enable logging for the control plane, and monitoring of all metrics in the Cluster Services stack.

import * as azure from "@pulumi/azure";

// Enable the Monitoring Diagonostic control plane component logs and AllMetrics
const azMonitoringDiagnostic = new azure.monitoring.DiagnosticSetting(name, {
    logAnalyticsWorkspaceId: config.logAnalyticsWorkspaceId,
    targetResourceId: config.clusterId,
    logs: ["kube-apiserver", "kube-controller-manager", "kube-scheduler", "kube-audit", "cluster-autoscaler"]
        .map(category => ({
            category,
            enabled : true,
            retentionPolicy: { enabled: true },
        })),
    metrics: [{
        category: "AllMetrics",
        retentionPolicy: { enabled: true },
    }],
});

Worker Nodes

To get the Worker kubelet logs you need to SSH into the nodes.

Use the node admin username and SSH key used in the Cluster Configuration stack.

import * as azure from "@pulumi/azure";

// Create the AKS cluster with LogAnalytics enabled in the given workspace.
const cluster = new azure.containerservice.KubernetesCluster(`${name}`, {
    ...
    resourceGroupName: config.resourceGroupName,
    linuxProfile: {
        adminUsername: "aksuser",
        sshKey: {
            keyData: sshPublicKey,
        },
    },
});

See the official AKS docs for more details.

Overview

We’ll explore how to setup:

See the official GKE and StackDriver Observing docs for more details.

GCP Logging and Monitoring

GKE monitoring is managed by GCP through StackDriver.

Stackdriver Kubernetes Engine Monitoring is the default logging option for GKE clusters, and it comes automatically enabled for all clusters starting with version 1.14.

Enable the Node Pool

Enable the cluster’s Node Pool with the proper logging and monitoring permission in the Cluster Configuration stack.

import * as gcp from "@pulumi/gcp";

// Create a GKE cluster.
// Versions >= 1.14 have Stackdriver Monitoring enabled by default.
const cluster = new gcp.container.Cluster(`${name}`, {
    ...
    minMasterVersion: "1.14.7-gke.17",
}

// Create the GKE Node Pool with OAuth scopes enabled for logging and monitoring.
const standardNodes = new gcp.container.NodePool("standard-nodes", {
    ...
    cluster: cluster.name,
    version: "1.14.7-gke.17",
    nodeConfig: {
        machineType: "n1-standard-1",
        oauthScopes: [
            "https://www.googleapis.com/auth/compute",
            "https://www.googleapis.com/auth/devstorage.read_only",
            "https://www.googleapis.com/auth/logging.write",
            "https://www.googleapis.com/auth/monitoring",
        ],
    },
});

AWS Monitoring

Using the YAML manifests in the AWS samples, we can provision the CloudWatch Agent to run as a DaemonSet and send metrics to CloudWatch.

Install CloudWatch Agent

Create a Namespace.

$ kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/cloudwatch-namespace.yaml

Create a ServiceAccount.

$ kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/cwagent-kubernetes-monitoring/cwagent-serviceaccount.yaml

Create a ConfigMap.

$ curl -s https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/cwagent-kubernetes-monitoring/cwagent-configmap.yaml | sed -e "s#{{cluster_name}}#`pulumi stack output clusterName`#g" | kubectl apply -f -

Deploy the DaemonSet.

$ kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/cwagent-kubernetes-monitoring/cwagent-daemonset.yaml

Validate the deployment.

$ kubectl get pods -n amazon-cloudwatch

Verify the metrics setup in the CloudWatch console by navigating to Logs in your region, and looking for the following group.

/aws/containerinsights/Cluster_Name/performance

The cluster name can be retrieved from the cluster stack output.

$ pulumi stack output clusterName

You can also examine the stats in the CloudWatch console by navigating to Metrics in your region, and looking for the ContainerInsights for your cluster by its name.

Clean Up.

$ kubectl delete ns amazon-cloudwatch

Datadog

Deploy Datadog as a DaemonSet to aggregate Kubernetes, node, and container metrics and events, in addition to provider managed logging and monitoring.

The full code for this app stack is on GitHub.

import * as k8s from "@pulumi/kubernetes";

const appName = "datadog";
const appLabels = { app: appName };

// Create a DataDog DaemonSet.
const datadog = new k8s.apps.v1.DaemonSet(appName, {
    metadata: { labels: appLabels},
    spec: {
        selector: {
            matchLabels: appLabels,
        },
        template: {
            metadata: { labels: appLabels },
            spec: {
                containers: [
                    {
                        image: "datadog/agent:latest",
                        name: "nginx",
                        resources: {limits: {memory: "512Mi"}, requests: {memory: "512Mi"}},
                        env: [
                            {
                                name: "DD_KUBERNETES_KUBELET_HOST",
                                valueFrom: {
                                    fieldRef: {
                                        fieldPath: "status.hostIP",
                                    },
                                },
                            },
                            {
                                name: "DD_API_KEY",
                                valueFrom: {
                                    configMapKeyRef: {
                                        name: ddConfigMap.metadata.name,
                                        key: "DD_API_KEY",
                                    },
                                },
                            },
                            {
                                name: "DD_PROCESS_AGENT_ENABLED",
                                valueFrom: {
                                    configMapKeyRef: {
                                        name: ddConfigMap.metadata.name,
                                        key: "DD_PROCESS_AGENT_ENABLED",
                                    },
                                },
                            },
                            {
                                name: "DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL",
                                valueFrom: {
                                    configMapKeyRef: {
                                        name: ddConfigMap.metadata.name,
                                        key: "DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL",
                                    },
                                },
                            },
                            {
                                name: "DD_COLLECT_KUBERNETES_EVENTS",
                                valueFrom: {
                                    configMapKeyRef: {
                                        name: ddConfigMap.metadata.name,
                                        key: "DD_COLLECT_KUBERNETES_EVENTS",
                                    },
                                },
                            },
                            ...
                        ],
                        volumeMounts: [
                            {name: "dockersocket", mountPath: "/var/run/docker.sock"},
                            {name: "proc", mountPath: "/host/proc"},
                            {name: "cgroup", mountPath: "/host/sys/fs/cgroup"},
                        ],
                    },
                ],
                volumes: [
                    {name: "dockersocket", hostPath: {path: "/var/run/docker.sock"}},
                    {name: "proc", hostPath: {path: "/proc"}},
                    {name: "cgroup", hostPath: {path: "/sys/fs/cgroup"}},
                ],
            },
        },
    },
}, { provider: provider });