Automated Incident Response with AWS SSM Contacts and AI Services
PythonAutomated Incident Response on AWS can significantly speed up the handling of incidents by immediately taking action upon certain triggers, such as infrastructure anomalies or security events. AWS Systems Manager Incident Manager (SSM Contacts) is an AWS service that enables you to define contacts and escalation plans for incident response. AI services, such as AWS Lambda and Amazon CloudWatch, can be used to detect incidents and trigger automated responses.
Below is a Pulumi program written in Python that illustrates how you can set up an automated incident response system using AWS SSM Contacts and AI Services. The following resources will be created:
- SSM
Contact
andContactChannel
to define who should be contacted when an incident occurs. - Lamba
Function
, which will act as the AI service that detects anomalies and triggers the incident response. - CloudWatch
Rule
to invoke the Lambda function on a scheduled basis or based on specific triggers. - SSM
Document
used by Systems Manager to automate the actual incident response. - IAM Roles and Policies for required permissions for Lambda functions and SSM automation.
import pulumi import pulumi_aws as aws # Creating an SSM Contact, which defines the information about people or services to contact during an incident. contact = aws.ssm.Contact("contact", plan={ "stages": [{ "durationInMinutes": 15, "targets": [{ "channelTargetInfo": { "contactChannelId": "ABCDE12345", # Replace with actual channel ID "retryIntervalInMinutes": 1, }, "targetType": "CONTACT_CHANNEL", }], }], } ) # Creating an SSM Contact Channel, which defines a method of contact, such as SMS, email, etc. contact_channel = aws.ssm.ContactChannel("contactChannel", contact_id=contact.id, delivery_address={ "simpleAddress": "example@example.com", # Replace with actual email address or phone number }, type="EMAIL" ) # Creation of a Lambda function which will be triggered on the incident. lambda_role = aws.iam.Role("lambdaRole", assume_role_policy={ "Version": "2012-10-17", "Statement": [{ "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "Service": "lambda.amazonaws.com", }, }], }) # Attaching a policy to the Lambda Role to grant it permissions to execute and log attach_policy = aws.iam.RolePolicyAttachment("lambda-attach-policy", policy_arn="arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole", role=lambda_role.name, ) # Creating the actual Lambda function. lambda_function = aws.lambda_.Function("incidentResponseFunction", code=pulumi.FileArchive("./lambda.zip"), # Assume we have a lambda.zip file with our code role=lambda_role.arn, handler="index.handler", # The lambda function entrypoint runtime="python3.8", # The runtime language for the lambda function ) # Setting up a CloudWatch Rule to trigger the Lambda function based on an event or a schedule cloudwatch_event_rule = aws.cloudwatch.EventRule("cloudwatchEventRule", description="A CloudWatch Event Rule that triggers on incident", schedule_expression="rate(5 minutes)", # Example schedule every 5 minutes ) # Granting the CloudWatch Event Rule permission to invoke the Lambda Function cloudwatch_event_target = aws.cloudwatch.EventTarget("cloudwatchEventTarget", rule=cloudwatch_event_rule.name, arn=lambda_function.arn, ) # The SSM Document which defines the automation execution when the Lambda Function identifies an incident. ssm_document = aws.ssm.Document("ssmIncidentResponse", content=pulumi.Output.all(contact.id, contact_channel.id).apply(lambda args: { "schemaVersion": "1.2", "description": "Respond to an incident", "parameters": {}, "runtimeConfig": { "aws:executeAutomation": { "parameters": { "DocumentName": "AWS-RespondToHighSeverityAlarm", "Targets": [{ "Key": "Contact", "Values": [args[0]], }], "ContactChannels": [args[1]] }, }, }, }), document_type="Automation", ) pulumi.export('lambda_function_arn', lambda_function.arn) pulumi.export('contact_id', contact.id) pulumi.export('contact_channel_id', contact_channel.id) pulumi.export('ssm_document_name', ssm_document.name)
In this program:
- The SSM Contact and Contact Channel define the people or services to contact in case of an incident.
lambda_function
acts as a detection system; it can be triggered by other AWS services or cloud events that you specify in acloudwatch_event_rule
.- The
ssm_document
provides automation that AWS Systems Manager can execute in response to an incident. It references the SSM Contact and Channel as parameters for the automation. lambda_role
andattach_policy
are required to give the Lambda function the necessary permissions to run.cloudwatch_event_target
sets up the CloudWatch event rule to target the Lambda function.
This program must be accompanied by the actual Lambda function code (placed in a file named
lambda.zip
), which contains the logic for detecting incidents and triggering the SSM document for response. Make sure you align thehandler
andruntime
parameters to match your Lambda code.Also remember to replace placeholder texts like
"ABCDE12345"
and"example@example.com"
with actual values appropriate for your setup.When you apply this Pulumi program, it systematically creates all the resources, and in the event of an incident, you will receive a notification through the specified Contact Channel, and the SSM Document automation will start executing. This approach ensures that responders are notified quickly and that predefined response activities commence without delay.
- SSM