Serverless API Front-end for ML Model Inference.

Question

Pulumi · Accepted Answer

To create a serverless API that serves as a front-end for a Machine Learning (ML) model inference, you can leverage cloud services that provide serverless computing and managed ML services. One such service is AWS Lambda for the computation part and Amazon SageMaker for hosting ML models. Here's a step-by-step explanation followed by a Pulumi Python program that sets up the necessary resources:

Amazon SageMaker Model: You need to have a machine learning model that is trained and ready for deployment. For this, you can use Amazon SageMaker which provides a platform to build, train, and deploy machine learning models.
SageMaker Endpoint Configuration: Once your model is ready, you need an endpoint configuration. This includes the setup for the ML model, such as the type and number of instances that will be used for the inference and other configurations.
SageMaker Endpoint: This endpoint is a HTTPS URL that serves as the entry point for your ML model. When you invoke this endpoint, it will run your model and return the inference results.
AWS Lambda Function: To create a serverless API, you can create a Lambda function which gets triggered upon HTTP requests. This Lambda function is then responsible for parsing the request, invoking the SageMaker Endpoint, and returning the response to the client.
Amazon API Gateway: To expose your Lambda function to the web, you use Amazon API Gateway, which can map HTTP requests to your Lambda function and also handle tasks like user authentication, rate limiting, and more.

Now, let's create the necessary resources using Pulumi:

import pulumi
import pulumi_aws as aws

# Define the SageMaker model, assuming a model package is already available.
sagemaker_model = aws.sagemaker.Model("model",
    execution_role_arn="<ROLE_ARN>",  # ARN of the role to access the model package
    primary_container={
        "image": "<IMAGE>",  # Your ML model Docker image
        "model_data_url": "<MODEL_DATA_URL>",  # Path to the trained ML model artifacts
    }
)

# Create a SageMaker Endpoint Configuration
endpoint_config = aws.sagemaker.EndpointConfiguration("endpointConfig",
    production_variants=[{
        "instanceType": "ml.m5.large",
        "modelName": sagemaker_model.name,
        "initial_instance_count": 1,
        "variantName": "AllTraffic",
    }]
)

# Deploy the SageMaker Endpoint
endpoint = aws.sagemaker.Endpoint("endpoint",
    endpoint_config_name=endpoint_config.name
)

# Define the Lambda function that will invoke the SageMaker endpoint
lambda_function = aws.lambda_.Function("function",
    role="<LAMBDA_ROLE_ARN>",  # ARN of IAM Role with permissions to access SageMaker and logs
    runtime="python3.8",
    handler="handler.main",  # Assuming you have a handler.py with a main function
    code=pulumi.FileArchive("./lambda"),  # Your Lambda function code
    environment={
        "variables": {
            "SAGEMAKER_ENDPOINT_NAME": endpoint.name,
        }
    },
)

# Create an API Gateway to trigger the Lambda function
api_gateway = aws.apigatewayv2.Api("api",
    protocol_type="HTTP",
    route_selection_expression="$request.method $request.path",
)

# Create a route for POST requests
post_route = aws.apigatewayv2.Route("postRoute",
    api_id=api_gateway.id,
    route_key="POST /predict",
    target=pulumi.Output.concat("integrations/", lambda_function.name),
)

# Create an integration to connect the API Gateway with the Lambda function
integration = aws.apigatewayv2.Integration("integration",
    api_id=api_gateway.id,
    integration_type="AWS_PROXY",
    integration_uri=lambda_function.invoke_arn,
)

# Create an IAM policy that allows API Gateway to invoke the Lambda
policy = aws.lambda_.Permission("apiPermission",
    action="lambda:InvokeFunction",
    function=lambda_function.name,
    principal="apigateway.amazonaws.com",
    source_arn=pulumi.Output.concat(api_gateway.execution_arn, "/*/*"),
)

# Export the HTTPS endpoint of the API Gateway
pulumi.export("api_url", api_gateway.api_endpoint)

In the above program:

Replace <ROLE_ARN> with the ARN of the IAM Role that grants access to your SageMaker model package.
Replace <IMAGE> with the Docker image URL of your ML model.
Replace <MODEL_DATA_URL> with the S3 URL of your model artifacts.
Replace <LAMBDA_ROLE_ARN> with the ARN of the IAM Role that has permissions for AWS Lambda to access Amazon SageMaker and write logs.
The Lambda function code needs to be packaged in a directory called lambda alongside your Pulumi program, with an entry file that contains a function main.

This setup creates a serverless architecture with AWS Lambda and Amazon SageMaker, exposing a REST API for making predictions from your machine learning model.