Comparing Model Versions with Evidently in AWS ML Workflows

Question

Pulumi · Accepted Answer

In AWS, Amazon SageMaker is often used for machine learning (ML) workflows, and AWS Evidently is a tool used to measure, evaluate, and improve ML models by running experiments. With AWS Evidently, you can launch feature tests to compare different versions of a model or a feature to determine which one performs the best according to metrics that you have defined.

Below is a Pulumi program written in Python that demonstrates how to create a SageMaker Model Package Group, which is a way to manage different versions of machine learning models in SageMaker. Additionally, the program illustrates how to use AWS Evidently to create experiments and features to compare the performance of model versions.

First, we create a Model Package Group in SageMaker, where different model versions will be stored. Then, we define features in Evidently that will be varied during the experiments. Next, we create an Evidently project, a segment to define the audience for the experiment, and an experiment to compare the model versions. Finally, we define the metrics that the experiment will track to evaluate the model performance.

```python
import pulumi
import pulumi_aws as aws

# Create a SageMaker Model Package Group to manage various model versions.
model_package_group = aws.sagemaker.ModelPackageGroup("modelPackageGroup",
    model_package_group_name="MyModelPackageGroup",
    model_package_group_description="Group of model versions for comparison")

# Creating an AWS Evidently Project to house our experiments.
evidently_project = aws.evidently.Project("evidentlyProject",
    name="MyEvidentlyProject")

# Define a Segment in Evidently to specify the audience for the experiment.
evidently_segment = aws.evidently.Segment("evidentlySegment",
    project=evidently_project.name,
    name="MySegment",
    pattern='{"source":"AWS::Evidently::Project"}') # Replace with specific segment pattern as needed.

# Define a Feature in Evidently that we will use for testing different model outputs.
evidently_feature = aws.evidently.Feature("evidentlyFeature",
    project=evidently_project.name,
    name="ModelOutputFeature",
    variations=[
        {"boolean_value": False, "variation_name": "Control"},
        {"boolean_value": True, "variation_name": "Treatment"}
    ],
    default_variation="Control")

# Finally, define an Experiment in Evidently to compare different variations of our feature, which are different model versions.
evidently_experiment = aws.evidently.Experiment("evidentlyExperiment",
    project=evidently_project.name,
    name="ModelComparisonExperiment",
    description="Experiment to compare different model versions",
    segment=evidently_segment.name,
    treatments=[
        {"feature": evidently_feature.name, "variation": "Control", "treatment_name": "VersionA"},
        {"feature": evidently_feature.name, "variation": "Treatment", "treatment_name": "VersionB"}
    ],
    metric_goals=[{
        "metric_name":"SuccessMetric",
        "entity_id_key":"userId",
        "value_key":"success",
        "desired_change":"INCREASE"
    }]
)

pulumi.export('ModelPackageGroupName', model_package_group.model_package_group_name)
pulumi.export('EvidentlyProjectName', evidently_project.name)
pulumi.export('EvidentlyExperimentName', evidently_experiment.name)
```

In this example:

- We first create a `ModelPackageGroup` through SageMaker, which is a container for different model versions. When you have multiple versions of a model, you can group them together for organizational purposes or to work on them in parallel. The `model_package_group_name` parameter is required and is used to create a unique identifier for the Model Package Group.

- We then set up an AWS Evidently `Project`. Projects in Evidently are used to organize all the resources needed for running experiments, which may include features, segments, and experiments themselves.

- The `Segment` resource in Evidently allows you to define a subset of your users based on criteria you specify. Here it's just named "MySegment" but in a real-world scenario, you'd specify a `pattern` that matches a subset of your application users.

- We define a `Feature` in Evidently, which represents a variation of a model's output or a particular feature set in your application that you want to test.

- The `Experiment` is where we actually test our different feature variations. We define `treatments` that describe the different feature variations (control and treatment groups in this case) and `metric_goals` to specify the success criteria for the experiment.

By exporting resource names, we make it possible to retrieve these names outside of Pulumi for further use or reference.

In an actual implementation, you would have to ensure that you specify your segment pattern correctly, and that your metrics link properly to your application's event logs and outcomes. The `desired_change` field in `metric_goals` tells Evidently whether you want the metric to increase or decrease (this could relate to user click-through rates, conversion rates, etc., depending on your application).

This program is just a setup for the infrastructure you'll use to compare model versions with Evidently in AWS. To actually run the experiment and collect data, you'll need to integrate your application with these resources. This often involves further application code and AWS set-up, such as sending event data to Evidently for analysis.