Automated Rollbacks for Faulty AI Service Deployments

Question

Pulumi · Accepted Answer

Rolling back faulty deployments automatically is a crucial part of maintaining a reliable continuous deployment process, especially for AI services where models and data can greatly affect the performance and stability of the application.

In this guide, we'll prepare a Pulumi program in Python to illustrate how you can set up automated rollbacks for a faulty AI service deployment on AWS using AWS CodeDeploy. AWS CodeDeploy is a service that automates application deployments to various compute services such as Amazon EC2, AWS Fargate, and AWS Lambda.

The main components of the rollback mechanism in AWS CodeDeploy are:
- **Deployment Group**: A set of individual instances or resources where the application will be deployed.
- **Deployment Config**: Settings that control the deployment process, like the minimum number of healthy hosts.
- **Auto Rollback Configuration**: A feature that automatically rolls back the deployment if the deployment fails.

We'll focus on the `AutoRollbackConfiguration` setting of the `DeploymentGroup` resource because it's responsible for triggering rollbacks on deployment failures. We'll set it up so rollbacks will activate if there are any issues during the deployment process.

Here's a Pulumi program that sets up an AWS CodeDeploy application, a deployment group, and specifies an automatic rollback configuration:

```python
import pulumi
import pulumi_aws as aws

# Create an AWS CodeDeploy application
codedeploy_app = aws.codedeploy.Application("myAIServiceApp",
    compute_platform="Server",  # Choose 'Lambda' or 'ECS' depending on your compute platform.
)

# Create a deployment configuration that specifies the minimum number of healthy hosts
codedeploy_config = aws.codedeploy.DeploymentConfig("myAIServiceDeploymentConfig",
    minimum_healthy_hosts=aws.codedeploy.DeploymentConfigMinimumHealthyHostsArgs(
        type="HOST_COUNT",  # You can also use 'FLEET_PERCENT' based on your requirement
        value=1,  # The minimum number of healthy hosts or the percentage of the fleet
    ),
)

# Create a CodeDeploy deployment group with auto-rollback configuration on deployment failures
codedeploy_deployment_group = aws.codedeploy.DeploymentGroup("myAIServiceDeploymentGroup",
    app_name=codedeploy_app.name,
    deployment_group_name="my-deployment-group",
    deployment_config_name=codedeploy_config.id,
    service_role_arn="arn:aws:iam::123456789012:role/ECSRole", # Replace with your service role ARN
    auto_rollback_configuration=aws.codedeploy.DeploymentGroupAutoRollbackConfigurationArgs(
        enabled=True,
        events=["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_ALARM", "DEPLOYMENT_STOP_ON_REQUEST"],
    ),
    # The below settings depend on your target - instances, ECS services, Lambda functions, etc.
    # ec2_tag_filters=[...],  # Uncomment and configure if deploying to EC2 instances
    # ecs_service=...,  # Uncomment and configure if deploying to ECS
    # on_premises_instance_tag_filters=[...],  # Uncomment and configure if deploying to on-premise instances
)

# Export the deployment group's name
pulumi.export("deployment_group_name", codedeploy_deployment_group.deployment_group_name)
```

This program sets up automated rollbacks via the `auto_rollback_configuration` parameter, where we enable rollback on certain events like deployment failure. The `DeploymentConfig` specifies how many healthy hosts we must have at minimum during the deployment; if it goes below, the deployment will stop, and a rollback will occur.

Remember, this is a high-level overview, and actual deployment setups can be complex depending on various parameters like load balancing, health checks, etc. For this reason, proper AWS IAM permissions are also necessary for CodeDeploy to operate correctly on your AWS resources.

For more information on AWS CodeDeploy with Pulumi:
- [AWS CodeDeploy DeploymentConfig](https://www.pulumi.com/registry/packages/aws/api-docs/codedeploy/deploymentconfig/)
- [AWS CodeDeploy DeploymentGroup](https://www.pulumi.com/registry/packages/aws/api-docs/codedeploy/deploymentgroup/)

After deploying with Pulumi, if you have a faulty deployment, AWS CodeDeploy will automatically revert to the last known good deployment, reducing downtime and maintaining service stability.