Real-time Inference Endpoints with Amazon SageMaker

Question

Pulumi · Accepted Answer

To create a real-time inference endpoint with Amazon SageMaker using Pulumi, you will need to follow a series of steps within your Pulumi program. Below is a walkthrough of what the Pulumi program will encompass, following the Python programming language.

1. **Model Deployment with SageMaker**: First, we need to deploy a machine learning model that we've trained onto SageMaker. We will define a SageMaker model resource that points to the S3 location of the model artifacts and the Docker container (image) that can serve the model.

2. **Endpoint Configuration**: Next, we must set up an endpoint configuration. This includes specifying the type and number of instances to use for real-time inference.

3. **Endpoint Creation**: Finally, we create a SageMaker endpoint that references the earlier configuration. This endpoint is what applications will use to perform real-time inference with the deployed machine learning model.

Let's see how this translates into Pulumi code:

```python
import pulumi
import pulumi_aws as aws

# Assuming you already have a trained model and a Docker image that serves the model ready for use.
# The Docker image should be in ECR and the model artifacts in an S3 bucket.

# 1. Define the SageMaker model, pointing to the model artifacts in S3 and the Docker image in ECR.
model = aws.sagemaker.Model("myModel",
    execution_role_arn="arn:aws:iam::123456789012:role/SageMakerExecutionRole", # Replace with your SageMaker execution role ARN
    primary_container={
        "image": "123456789012.dkr.ecr.us-west-2.amazonaws.com/my-model-container:latest", # Replace with your ECR image URL
        "model_data_url": "s3://my-model-bucket/model.tar.gz" # Replace with your S3 URL to the model artifacts
    }
)

# 2. Setup the endpoint configuration with the desired instance type and other properties.
endpoint_config = aws.sagemaker.EndpointConfiguration("myEndpointConfig",
    production_variants=[{
        "variant_name": "variant-1",
        "model_name": model.name,
        "initial_instance_count": 1,
        "instance_type": "ml.m5.large"
    }]
)

# 3. Create the SageMaker endpoint using the above endpoint configuration.
endpoint = aws.sagemaker.Endpoint("myEndpoint",
    endpoint_config_name=endpoint_config.name
)

# Export the SageMaker endpoint name
pulumi.export('sagemaker_endpoint_name', endpoint.endpoint_name)
```

In the above program:

- **SageMaker Model**: We defined a `aws.sagemaker.Model` with necessary parameters such as the execution role ARN, the image URL of the Docker container, and the model data URL pointing to the S3 bucket where the model is stored.

- **Endpoint Configuration**: We then set up an `aws.sagemaker.EndpointConfiguration`, configuring a `production_variant` which references our model. This includes setting the initial number of instances (`initial_instance_count`) and the instance type (`instance_type`). These settings can be adjusted to match the performance and cost requirements of your use case.

- **Endpoint Creation**: Finally, we create a `aws.sagemaker.Endpoint` that uses the endpoint configuration. This is the actual endpoint that will be used for real-time inference.

- **Exports**: At the end of the program, the created SageMaker endpoint name is exported. This is especially useful for CI/CD environments or when integrating this Pulumi stack with other stacks.

Make sure you have the appropriate AWS credentials and permissions set up to run this Pulumi program. You should also have the AWS Pulumi provider configured appropriately for your account and region.