Model Inference API Hosting with API Gateway
PythonTo host a model inference API, we can utilize cloud services like AWS API Gateway which can handle incoming API requests, manage traffic, control access and authorization, monitor and handle API versions, and more. AWS API Gateway can act as a front door to applications running on AWS Lambda, Amazon ECS, or in any HTTP endpoint.
In the Pulumi program below, we will create a simple AWS API Gateway REST API, which connects to AWS Lambda to handle our inference model. AWS Lambda will host our inference code. Here's how it can be done:
- Set up an AWS Lambda Function: This function will run our inference model code.
- Create an AWS API Gateway REST API: This will be our publicly accessible endpoint.
- Define an
Integration
between the REST API and the Lambda Function: This ensures that when our API endpoint is hit, it invokes the Lambda Function. - Define a
Method
for ourResource
: This specifies the type of request (GET, POST, etc.) and links to our Lambda integration. - Deploy our API using the
Stage
resource: This represents a snapshot of the API which can be invoked by clients.
For this example, I'm assuming that
inference_handler.py
contains your inference model code and a Lambda-compatible handler function namedhandler
.Here's what the full Pulumi program looks like:
import pulumi import pulumi_aws as aws # Assumes that you have inference_handler.py file with a 'handler' function # which is compatible with AWS Lambda's Python runtime environment. lambda_role = aws.iam.Role('lambdaRole', assume_role_policy="""{ "Version": "2012-10-17", "Statement": [{ "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "Service": "lambda.amazonaws.com" } }] }""") policy_attachment = aws.iam.RolePolicyAttachment('lambdaPolicyAttachment', role=lambda_role.name, policy_arn=aws.iam.ManagedPolicy.AWS_LAMBDA_BASIC_EXECUTION_ROLE.value) # Upload our model inference code to Lambda inference_lambda = aws.lambda_.Function('inferenceLambda', code=pulumi.FileArchive('./inference_handler.zip'), role=lambda_role.arn, handler='inference_handler.handler', runtime='python3.8') # API Gateway: REST API rest_api = aws.apigateway.RestApi('modelInferenceApi', description='Endpoint for model inference') # Resource setup (typically at the root '/') resource = aws.apigateway.Resource('apiResource', parent_id=rest_api.root_resource_id, path_part='infer', rest_api=rest_api.id) # Lambda integration integration = aws.apigateway.Integration('lambdaIntegration', rest_api=rest_api.id, resource_id=resource.id, http_method='POST', integration_http_method='POST', type='AWS_PROXY', uri=inference_lambda.invoke_arn) # Method for invoking the API endpoint method = aws.apigateway.Method('apiMethod', rest_api=rest_api.id, resource_id=resource.id, http_method='POST', authorization='NONE', api_key_required=False, request_parameters={'method.request.header.Content-Type': False}, integration=integration) # Deployment of the API deployment = aws.apigateway.Deployment('apiDeployment', rest_api=rest_api.id, # Note: changes to the API (resources or methods) will be incorporated into # the deployment only when the stage is recreated, to achieve that a # unique name should be used for each new deployment. Pulumi does this by # default. lifecycle={ "create_before_destroy":True, }, opts=pulumi.ResourceOptions(depends_on=[method])) # Stage where our deployment is accessible stage = aws.apigateway.Stage('apiStage', rest_api=rest_api.id, deployment=deployment.id, stage_name='v1') # Export the HTTPS endpoint of the deployment pulumi.export('api_url', pulumi.Output.concat('https://', rest_api.id, '.execute-api.', aws.config.region, '.amazonaws.com/', stage.stage_name, '/infer'))
In this program:
- We create an IAM role for our Lambda function with basic execution permissions.
- The Lambda function is then defined with Python 3.8 as the runtime and points to our
inference_handler
function within our ZIPed code archive. - Our API Gateway REST API is set up with a single resource and method for POST requests.
- An integration is defined to connect incoming API requests to invoke our Lambda function.
- We deploy our API, associating it with the Lambda integration, and create a stage named
v1
for access. - The API Gateway endpoint URL is then exported as an output of our Pulumi program. You can use this URL to interact with your model inference API.
What is not shown in this code snippet is the definition of your model code within a file
inference_handler.py
, as well as its packaging into a ZIP file namedinference_handler.zip
. The code ininference_handler.py
should expose a function namedhandler
that can handle Lambda events.To run the Pulumi program, you will need Pulumi installed and AWS credentials configured. Then, run
pulumi up
to create the resources. After the program has finished, Pulumi will output the API URL you can use to send POST requests to your model inference endpoint.