1. Model Performance Monitoring with Amazon SageMaker Model Monitor

    Python

    Setting up model performance monitoring with Amazon SageMaker Model Monitor involves several steps with Pulumi in Python. SageMaker Model Monitor allows you to monitor machine learning models in production, automatically detect and alert on issues like data drift, and recommend retraining models if performance degrades.

    Let’s walk through what is needed to set up Model Monitoring for a SageMaker endpoint.

    1. Data Quality Job Definition: The first resource we'll create is a DataQualityJobDefinition. This defines the baseline statistics, constraints, and the monitoring schedule for your deployed models.

    2. Model Quality Monitoring Schedule: After setting up the job definition, you need to define a ModelQualityMonitoringSchedule. This establishes how often the monitoring jobs should be executed.

    3. Notification Configuration: If there are issues like data drift or other anomalies detected, you likely want to be notified. This can be done by setting up a AWS SNS Topic which sends a notification based on SageMaker's CloudWatch alarms.

    Program Explanation

    • DataQualityJobDefinition: Sets up a definition of the data quality monitoring job. You will specify roles, resources (like instance type and count), and the network configuration for VPC if required.

    • ModelQualityMonitoringSchedule: Creates a schedule for the monitoring job, specifying how often monitoring should occur.

    • SNS Topic & CloudWatch Alarm: Optionally, you can create an SNS topic to alerted when Model Monitor triggers an alarm, indicating an issue has been detected.

    Below is the Pulumi program that sets up model performance monitoring with SageMaker Model Monitor:

    import pulumi import pulumi_aws as aws import pulumi_aws_native as aws_native # You need to replace these with appropriate values sagemaker_execution_role_arn = "arn:aws:iam::123456789012:role/SageMakerExecutionRole" # This is the image URI for data quality monitoring provided by SageMaker image_uri = "123456789012.dkr.ecr.us-west-2.amazonaws.com/sagemaker-model-monitor-analyzer" # Define the Data Quality Job Definition data_quality_job_definition = aws_native.sagemaker.DataQualityJobDefinition("dataQualityJobDef", role_arn=sagemaker_execution_role_arn, job_resources=aws_native.sagemaker.JobResourcesArgs( cluster_config=aws_native.sagemaker.ClusterConfigArgs( instance_count=1, instance_type="ml.m5.xlarge", volume_size_in_gb=30, ), ), data_quality_app_specification=aws_native.sagemaker.DataQualityAppSpecificationArgs( image_uri=image_uri, ), data_quality_job_output_config=aws_native.sagemaker.MonitoringOutputConfigArgs( monitoring_outputs=[ aws_native.sagemaker.MonitoringOutputArgs( s3_output=aws_native.sagemaker.S3OutputArgs( s3_uri="s3://bucketname/output", local_path="/opt/ml/processing/output", s3_upload_mode="Continuous", ) ) ], ), data_quality_baseline_config=aws_native.sagemaker.DataQualityBaselineConfigArgs( statistics_resource=aws_native.sagemaker.StatisticsResourceArgs( s3_uri="s3://bucketname/statistics.json", ), constraints_resource=aws_native.sagemaker.ConstraintsResourceArgs( s3_uri="s3://bucketname/constraints.json", ), ) ) # Arbitrarily, let's assume monitoring every hour monitoring_schedule_cron = "cron(0 * ? * * *)" # Define the Model Quality Monitoring Schedule model_quality_monitoring_schedule = aws_native.sagemaker.MonitoringSchedule("modelQualitySchedule", monitoring_schedule_name="MyModelQualitySchedule", monitoring_schedule_config=aws_native.sagemaker.MonitoringScheduleConfigArgs( monitoring_type="DataQuality", scheduled_audit_specification=aws_native.sagemaker.ScheduledAuditSpecificationArgs( schedule_expression=monitoring_schedule_cron, ), monitoring_job_definition_name=data_quality_job_definition.job_definition_name, ), ) # Create a SNS Topic to send notifications sns_topic = aws.sns.Topic("monitoringSnsTopic") # Define the SNS subscription, e.g., an email address email_subscription = aws.sns.TopicSubscription("monitoringEmailSubscription", protocol="email", endpoint="your-email@example.com", # replace with your email to receive notifications topic=sns_topic.arn ) # Export the ARN of the SNS topic to use it in CloudWatch Alarms or other resources pulumi.export("sns_topic_arn", sns_topic.arn)

    Note

    • Replace sagemaker_execution_role_arn, image_uri, s3_uri for statistics and constraints, and your-email@example.com with your actual values.
    • Ensure that the S3 bucket and paths you specify for output, statistics, and constraints are correct and that SageMaker has permissions to access them.
    • Make sure the role associated with SageMaker Execution (sagemaker_execution_role_arn) has the necessary permissions.
    • The schedule expression is set to run every hour. Adjust the cron expression as per your monitoring requirements.

    For more information about the AWS-native provider resources, you can refer to the Pulumi AWS-native Provider Documentation.

    The above program sets up the data quality job definition and a schedule to execute that job. It also provides an example to create an SNS topic for receiving email notifications should there be any alerts, which you can configure further with CloudWatch alarms to monitor SageMaker Model Monitor alerts.