Kubernetes cluster services
Cluster services are general services scoped at the Kubernetes cluster level. These services tend to include logging and monitoring at a minimum for the whole cluster, or a subset of apps and workloads. It could also include policy enforcement and service meshes.
The full code for the AWS cluster services is on GitHub.
The full code for the Azure cluster services is on GitHub.
GKE logging and monitoring is managed by Google Cloud through StackDriver.
The repo for the Google Cloud cluster services is on GitHub, but it is empty since no extra steps are required after cluster and Node Pool creation in the Cluster Configuration stack.
The full code for the general cluster services is on GitHub.
Overview
We’ll explore how to setup:
See the official AWS docs for more details.
Prerequisites
Authenticate as the admins
role from the Identity stack.
$ aws sts assume-role --role-arn `pulumi stack output adminsIamRoleArn` --role-session-name k8s-admin
$ export KUBECONFIG=`pwd`/kubeconfig-admin.json
AWS Logging
Control Plane
In the Recommended Settings of Creating the Control Plane, we enabled cluster logging for the various controllers of the control plane.
To view these logs, go to the CloudWatch console, navigate to the logs in your region, and look for the following group.
/aws/eks/Cluster_Name/cluster
The cluster name can be retrieved from the cluster stack output.
$ pulumi stack output clusterName
Worker Nodes and Pods
Configure Worker Node IAM Policy
To work with Cloudwatch Logs, the identities created in Identity for each worker node group must have the proper permissions in IAM.
Attach the permissions to the IAM role for each nodegroup.
import * as aws from "@pulumi/aws";
// Parse out the role names e.g. `roleName-123456` from `arn:aws:iam::123456789012:role/roleName-123456`
const stdNodegroupIamRoleName = config.stdNodegroupIamRoleArn.apply(s => s.split("/")).apply(s => s[1])
const perfNodegroupIamRoleName = config.perfNodegroupIamRoleArn.apply(s => s.split("/")).apply(s => s[1])
// Create a new IAM Policy for fluentd-cloudwatch to manage CloudWatch Logs.
const name = "fluentd-cloudwatch";
const fluentdCloudWatchPolicy = new aws.iam.Policy(name,
{
description: "Allows fluentd-cloudwatch to work with CloudWatch Logs.",
policy: JSON.stringify(
{
Version: "2012-10-17",
Statement: [{Effect: "Allow", Action: ["logs:*"], Resource: ["arn:aws:logs:*:*:*"]}]
}
)
},
);
// Attach CloudWatch Logs policies to a role.
function attachLogPolicies(name: string, arn: pulumi.Input<aws.ARN>) {
new aws.iam.RolePolicyAttachment(name,
{ policyArn: fluentdCloudWatchPolicy.arn, role: arn},
);
}
attachLogPolicies("stdRpa", stdNodegroupIamRoleName);
attachLogPolicies("perfRpa", perfNodegroupIamRoleName);
Using the YAML manifests in the AWS samples, we can provision fluentd-cloudwatch
to run as a DaemonSet and send worker and app logs to CloudWatch
Logs.
Install fluentd
Create a Namespace.
$ kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/cloudwatch-namespace.yaml
Create a ConfigMap.
$ kubectl create configmap cluster-info --from-literal=cluster.name=`pulumi stack output clusterName` --from-literal=logs.region=`pulumi stack output region` -n amazon-cloudwatch
Deploy the DaemonSet.
$ kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/fluentd/fluentd.yaml
Validate the deployment.
$ kubectl get pods -n amazon-cloudwatch
Verify the fluentd setup in the CloudWatch console by navigating to the logs in your region, and looking for the following groups.
/aws/containerinsights/Cluster_Name/application
/aws/containerinsights/Cluster_Name/host
/aws/containerinsights/Cluster_Name/dataplane
The cluster name can be retrieved from the cluster stack output.
$ pulumi stack output clusterName
Clean Up.
$ kubectl delete ns amazon-cloudwatch
Using the Helm chart, we can provision fluentd-cloudwatch
in Pulumi to run as a DaemonSet and send worker and app logs to CloudWatch
Logs.
Install fluentd
Deploy the Chart into the cluster-svcs
namespace created in Configure
Cluster Defaults .
import * as k8s from "@pulumi/kubernetes";
// Create a new provider to the cluster using the cluster's kubeconfig.
const provider = new k8s.Provider("provider", {kubeconfig: config.kubeconfig});
// Create a new CloudWatch Log group for fluentd-cloudwatch.
const fluentdCloudWatchLogGroup = new aws.cloudwatch.LogGroup(name);
export let fluentdCloudWatchLogGroupName = fluentdCloudWatchLogGroup.name;
// Deploy fluentd-cloudwatch using the Helm chart.
const fluentdCloudwatch = new k8s.helm.v3.Chart(name,
{
namespace: config.clusterSvcsNamespaceName,
chart: "fluentd-cloudwatch",
version: "0.11.0",
fetchOpts: {
repo: "https://charts.helm.sh/incubator",
},
values: {
extraVars: [ "{ name: FLUENT_UID, value: '0' }" ],
rbac: {create: true},
awsRegion: aws.config.region,
logGroupName: fluentdCloudWatchLogGroup.name,
},
transformations: [
(obj: any) => {
// Do transformations on the YAML to set the namespace
if (obj.metadata) {
obj.metadata.namespace = config.clusterSvcsNamespaceName;
}
},
],
},
{providers: { kubernetes: provider }},
);
Validate the deployment.
$ kubectl get pods -n `pulumi stack output clusterSvcsNamespaceName`
Verify the fluentd setup in the CloudWatch console by navigating to the logs in your region, and looking for the following group.
$ pulumi stack output fluentdCloudWatchLogGroupName
Note: CloudWatch is rate limited and often times the size of the data being sent can cause
ThrottlingException error="Rate exceeded"
. This can cause a delay in logs showing up in CloudWatch. Request a limit increase, or alter the data being sent, if necessary. See the CloudWatch limits for more details.
Overview
We’ll explore how to setup:
See the official Azure Monitor and AKS docs for more details.
Azure Logging and Monitoring
AKS monitoring is managed by Azure through Log Analytics.
Once enabled, in the Azure portal visit the cluster’s Kubernetes service details, and analyze its Azure Monitor information in the Monitoring section’s: Insights, Logs, and Metrics.
Enable Azure Monitor for the Cluster
Enable the Log Analytics agent on the AKS cluster in the Cluster Configuration stack.
import * as azure from "@pulumi/azure";
// Create the AKS cluster with LogAnalytics enabled in the given workspace.
const cluster = new azure.containerservice.KubernetesCluster(`${name}`, {
...
resourceGroupName: config.resourceGroupName,
addonProfile: {
omsAgent: {
enabled: true,
logAnalyticsWorkspaceId: config.logAnalyticsWorkspaceId,
},
},
});
Enable logging for the control plane, and monitoring of all metrics in the Cluster Services stack.
import * as azure from "@pulumi/azure";
// Enable the Monitoring Diagonostic control plane component logs and AllMetrics
const azMonitoringDiagnostic = new azure.monitoring.DiagnosticSetting(name, {
logAnalyticsWorkspaceId: config.logAnalyticsWorkspaceId,
targetResourceId: config.clusterId,
logs: ["kube-apiserver", "kube-controller-manager", "kube-scheduler", "kube-audit", "cluster-autoscaler"]
.map(category => ({
category,
enabled : true,
retentionPolicy: { enabled: true },
})),
metrics: [{
category: "AllMetrics",
retentionPolicy: { enabled: true },
}],
});
Worker Nodes
To get the Worker kubelet logs you need to SSH into the nodes.
Use the node admin username and SSH key used in the Cluster Configuration stack.
import * as azure from "@pulumi/azure";
// Create the AKS cluster with LogAnalytics enabled in the given workspace.
const cluster = new azure.containerservice.KubernetesCluster(`${name}`, {
...
resourceGroupName: config.resourceGroupName,
linuxProfile: {
adminUsername: "aksuser",
sshKey: {
keyData: sshPublicKey,
},
},
});
See the official AKS docs for more details.
Overview
We’ll explore how to setup:
See the official GKE and StackDriver Observing docs for more details.
Google Cloud Logging and Monitoring
GKE monitoring is managed by Google Cloud through StackDriver.
Stackdriver Kubernetes Engine Monitoring is the default logging option for GKE clusters, and it comes automatically enabled for all clusters starting with version 1.14.
Enable the Node Pool
Enable the cluster’s Node Pool with the proper logging and monitoring permission in the Cluster Configuration stack.
import * as gcp from "@pulumi/gcp";
// Create a GKE cluster.
// Versions >= 1.14 have Stackdriver Monitoring enabled by default.
const cluster = new gcp.container.Cluster(`${name}`, {
...
minMasterVersion: "1.14.7-gke.17",
}
// Create the GKE Node Pool with OAuth scopes enabled for logging and monitoring.
const standardNodes = new gcp.container.NodePool("standard-nodes", {
...
cluster: cluster.name,
version: "1.14.7-gke.17",
nodeConfig: {
machineType: "n1-standard-1",
oauthScopes: [
"https://www.googleapis.com/auth/compute",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
],
},
});
AWS Monitoring
Using the YAML manifests in the AWS samples, we can provision the CloudWatch Agent to run as a DaemonSet and send metrics to CloudWatch.
Install CloudWatch Agent
Create a Namespace.
$ kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/cloudwatch-namespace.yaml
Create a ServiceAccount.
$ kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/cwagent-kubernetes-monitoring/cwagent-serviceaccount.yaml
Create a ConfigMap.
$ curl -s https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/cwagent-kubernetes-monitoring/cwagent-configmap.yaml | sed -e "s#{{cluster_name}}#`pulumi stack output clusterName`#g" | kubectl apply -f -
Deploy the DaemonSet.
$ kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-yaml-templates/cwagent-kubernetes-monitoring/cwagent-daemonset.yaml
Validate the deployment.
$ kubectl get pods -n amazon-cloudwatch
Verify the metrics setup in the CloudWatch console by navigating to Logs in your region, and looking for the following group.
/aws/containerinsights/Cluster_Name/performance
The cluster name can be retrieved from the cluster stack output.
$ pulumi stack output clusterName
You can also examine the stats in the CloudWatch console by navigating to Metrics in your region, and looking for the ContainerInsights for your cluster by its name.
Clean Up.
$ kubectl delete ns amazon-cloudwatch
Datadog
Deploy Datadog as a DaemonSet to aggregate Kubernetes, node, and container metrics and events, in addition to provider managed logging and monitoring.
The full code for this app stack is on GitHub.
import * as k8s from "@pulumi/kubernetes";
const appName = "datadog";
const appLabels = { app: appName };
// Create a DataDog DaemonSet.
const datadog = new k8s.apps.v1.DaemonSet(appName, {
metadata: { labels: appLabels},
spec: {
selector: {
matchLabels: appLabels,
},
template: {
metadata: { labels: appLabels },
spec: {
containers: [
{
image: "datadog/agent:latest",
name: "nginx",
resources: {limits: {memory: "512Mi"}, requests: {memory: "512Mi"}},
env: [
{
name: "DD_KUBERNETES_KUBELET_HOST",
valueFrom: {
fieldRef: {
fieldPath: "status.hostIP",
},
},
},
{
name: "DD_API_KEY",
valueFrom: {
configMapKeyRef: {
name: ddConfigMap.metadata.name,
key: "DD_API_KEY",
},
},
},
{
name: "DD_PROCESS_AGENT_ENABLED",
valueFrom: {
configMapKeyRef: {
name: ddConfigMap.metadata.name,
key: "DD_PROCESS_AGENT_ENABLED",
},
},
},
{
name: "DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL",
valueFrom: {
configMapKeyRef: {
name: ddConfigMap.metadata.name,
key: "DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL",
},
},
},
{
name: "DD_COLLECT_KUBERNETES_EVENTS",
valueFrom: {
configMapKeyRef: {
name: ddConfigMap.metadata.name,
key: "DD_COLLECT_KUBERNETES_EVENTS",
},
},
},
...
],
volumeMounts: [
{name: "dockersocket", mountPath: "/var/run/docker.sock"},
{name: "proc", mountPath: "/host/proc"},
{name: "cgroup", mountPath: "/host/sys/fs/cgroup"},
],
},
],
volumes: [
{name: "dockersocket", hostPath: {path: "/var/run/docker.sock"}},
{name: "proc", hostPath: {path: "/proc"}},
{name: "cgroup", hostPath: {path: "/sys/fs/cgroup"}},
],
},
},
},
}, { provider: provider });
Thank you for your feedback!
If you have a question about how to use Pulumi, reach out in Community Slack.
Open an issue on GitHub to report a problem or suggest an improvement.