Model Inference API Hosting with API Gateway

Question

Pulumi · Accepted Answer

To host a model inference API, we can utilize cloud services like AWS API Gateway which can handle incoming API requests, manage traffic, control access and authorization, monitor and handle API versions, and more. AWS API Gateway can act as a front door to applications running on AWS Lambda, Amazon ECS, or in any HTTP endpoint.

In the Pulumi program below, we will create a simple AWS API Gateway REST API, which connects to AWS Lambda to handle our inference model. AWS Lambda will host our inference code. Here's how it can be done:

1. Set up an AWS Lambda Function: This function will run our inference model code.
2. Create an AWS API Gateway REST API: This will be our publicly accessible endpoint.
3. Define an `Integration` between the REST API and the Lambda Function: This ensures that when our API endpoint is hit, it invokes the Lambda Function.
4. Define a `Method` for our `Resource`: This specifies the type of request (GET, POST, etc.) and links to our Lambda integration.
5. Deploy our API using the `Stage` resource: This represents a snapshot of the API which can be invoked by clients.

For this example, I'm assuming that `inference_handler.py` contains your inference model code and a Lambda-compatible handler function named `handler`.

Here's what the full Pulumi program looks like:

```python
import pulumi
import pulumi_aws as aws

# Assumes that you have inference_handler.py file with a 'handler' function
# which is compatible with AWS Lambda's Python runtime environment.
lambda_role = aws.iam.Role('lambdaRole', assume_role_policy="""{
   "Version": "2012-10-17",
   "Statement": [{
     "Action": "sts:AssumeRole",
     "Effect": "Allow",
     "Principal": {
       "Service": "lambda.amazonaws.com"
     }
   }]
}""")

policy_attachment = aws.iam.RolePolicyAttachment('lambdaPolicyAttachment',
    role=lambda_role.name,
    policy_arn=aws.iam.ManagedPolicy.AWS_LAMBDA_BASIC_EXECUTION_ROLE.value)

# Upload our model inference code to Lambda
inference_lambda = aws.lambda_.Function('inferenceLambda',
    code=pulumi.FileArchive('./inference_handler.zip'),
    role=lambda_role.arn,
    handler='inference_handler.handler',
    runtime='python3.8')

# API Gateway: REST API
rest_api = aws.apigateway.RestApi('modelInferenceApi',
    description='Endpoint for model inference')

# Resource setup (typically at the root '/')
resource = aws.apigateway.Resource('apiResource',
    parent_id=rest_api.root_resource_id,
    path_part='infer',
    rest_api=rest_api.id)

# Lambda integration
integration = aws.apigateway.Integration('lambdaIntegration',
    rest_api=rest_api.id,
    resource_id=resource.id,
    http_method='POST',
    integration_http_method='POST',
    type='AWS_PROXY',
    uri=inference_lambda.invoke_arn)

# Method for invoking the API endpoint
method = aws.apigateway.Method('apiMethod',
    rest_api=rest_api.id,
    resource_id=resource.id,
    http_method='POST',
    authorization='NONE',
    api_key_required=False,
    request_parameters={'method.request.header.Content-Type': False},
    integration=integration)

# Deployment of the API
deployment = aws.apigateway.Deployment('apiDeployment',
    rest_api=rest_api.id,
    # Note: changes to the API (resources or methods) will be incorporated into
    # the deployment only when the stage is recreated, to achieve that a
    # unique name should be used for each new deployment. Pulumi does this by
    # default.
    lifecycle={
        "create_before_destroy":True,
    },
    opts=pulumi.ResourceOptions(depends_on=[method]))

# Stage where our deployment is accessible
stage = aws.apigateway.Stage('apiStage',
    rest_api=rest_api.id,
    deployment=deployment.id,
    stage_name='v1')

# Export the HTTPS endpoint of the deployment
pulumi.export('api_url', pulumi.Output.concat('https://', rest_api.id, '.execute-api.', aws.config.region, '.amazonaws.com/', stage.stage_name, '/infer'))
```

In this program:
- We create an IAM role for our Lambda function with basic execution permissions.
- The Lambda function is then defined with Python 3.8 as the runtime and points to our `inference_handler` function within our ZIPed code archive.
- Our API Gateway REST API is set up with a single resource and method for POST requests.
- An integration is defined to connect incoming API requests to invoke our Lambda function.
- We deploy our API, associating it with the Lambda integration, and create a stage named `v1` for access.
- The API Gateway endpoint URL is then exported as an output of our Pulumi program. You can use this URL to interact with your model inference API.

What is not shown in this code snippet is the definition of your model code within a file `inference_handler.py`, as well as its packaging into a ZIP file named `inference_handler.zip`. The code in `inference_handler.py` should expose a function named `handler` that can handle Lambda events.

To run the Pulumi program, you will need Pulumi installed and AWS credentials configured. Then, run `pulumi up` to create the resources. After the program has finished, Pulumi will output the API URL you can use to send POST requests to your model inference endpoint.