1. Automated Anomaly Detection for AI Workloads using CloudWatch


    To set up automated anomaly detection for AI workloads using AWS CloudWatch, you'll typically need to create a CloudWatch Alarm that triggers on an anomaly detection model. Anomaly detection models in CloudWatch allow you to establish a normal baseline of metrics, against which anomalies will be detected. When an anomaly is detected according to the model's confidence bands, the CloudWatch Alarm can perform various actions, such as sending a notification to an SNS topic or invoking an AWS Lambda function.

    Below is a Pulumi Python program that defines:

    • A CloudWatch metric alarm that uses an anomaly detection model.
    • An SNS topic where notifications will be sent if the alarm state changes.

    The alarm tracks a specific metric from your AI workload, which in this example is the number of invocations of an AWS Lambda function (this could be part of your AI workload). The alarm is configured to trigger if the invocations are higher or lower than expected, based on the anomaly detection model.

    import pulumi import pulumi_aws as aws # Create an SNS topic that will receive notifications when the alarm state changes alarm_topic = aws.sns.Topic("alarmTopic") # Define the necessary permission to allow CloudWatch alarms to publish to the SNS topic alarm_topic_policy = aws.sns.TopicPolicy("alarmTopicPolicy", arn=alarm_topic.arn, policy=alarm_topic.arn.apply(lambda arn: """{ "Version": "2012-10-17", "Id": "default", "Statement": [ { "Sid": "AllowPublishFromCloudWatchAlarms", "Effect": "Allow", "Principal": { "Service": "cloudwatch.amazonaws.com" }, "Action": "SNS:Publish", "Resource": "%s" } ] }""" % arn) ) # Define a CloudWatch metric alarm based on anomaly detection anomaly_detection_alarm = aws.cloudwatch.MetricAlarm("anomalyDetectionAlarm", # Use a specific namespace, metricName and dimensions based on your AI workload namespace="AWS/Lambda", metric_name="Invocations", dimensions={ "FunctionName": "your-ai-lambda-function-name", }, comparison_operator="LessThanLowerOrGreaterThanUpperThreshold", statistic="Sum", # Define the threshold model as 'AnomalyDetection' threshold_metric_id="e1", # Configure the number of evaluation periods evaluation_periods=2, # Set actions to trigger when alarm state changes alarm_actions=[alarm_topic.arn], ok_actions=[alarm_topic.arn], # Configure to treat missing data as notBreaching, this could be changed based on use-case treat_missing_data="notBreaching", # Define the metrics for anomaly detection metric_query=[ aws.cloudwatch.MetricAlarmMetricQueryArgs( id="e1", expression="ANOMALY_DETECTION_BAND(m1, 2)", label="Invocations (Anomaly Detection)", return_data=True ), aws.cloudwatch.MetricAlarmMetricQueryArgs( id="m1", metric=aws.cloudwatch.MetricAlarmMetricQueryMetricArgs( metric_name="Invocations", namespace="AWS/Lambda", statistic="Sum", period=300, unit="Count" ) ) ] ) # Export the name of the topic and the ARN of the CloudWatch alarm for reference pulumi.export('alarm_topic_name', alarm_topic.name) pulumi.export('cloudwatch_alarm_arn', anomaly_detection_alarm.arn)

    This Pulumi program configures anomaly detection for a hypothetical AWS Lambda function that's part of an AI workload. It uses the aws.cloudwatch.MetricAlarm resource (CloudWatch Metric Alarm) to define a metric alarm based on the number of function invocations.

    A MetricAlarmMetricQueryArgs is included in the metric_query parameter to define the actual data point to analyze, in this case, the sum of Lambda invocations over a 5-minute period (period=300). The expression="ANOMALY_DETECTION_BAND(m1, 2)" defines the anomaly detection model, with '2' specifying the number of standard deviations from the baseline to allow.

    When the CloudWatch alarm detects an anomaly based on the configured band, it triggers actions specified in alarm_actions and ok_actions. Here, those are configured to send a notification to an SNS topic that we have also defined in this program.

    Remember to replace "your-ai-lambda-function-name" with the actual name of your Lambda function.

    To use this program:

    1. Install the Pulumi CLI and configure your AWS credentials.
    2. Create a new Pulumi Python project.
    3. Write this Python program in a file named __main__.py.
    4. Run pulumi up to deploy the resources.

    The Pulumi CLI will output the names and ARNs of the created resources once the deployment is complete. These can be used to validate whether the alarm is configured correctly or not by checking the CloudWatch console or looking for notifications published to the SNS topic.