Serverless AI API Endpoints with AWS API Gateway

Question

Pulumi · Accepted Answer

When creating serverless AI API endpoints with AWS API Gateway, you'll typically want to achieve the following:

Create an API Gateway: This acts as the front door for your API, managing incoming API calls, including traffic management, authorization and access control, monitoring, and API version management.
Integrate with AWS Lambda: AWS Lambda lets you run code without provisioning or managing servers, and it's commonly used for serverless applications like AI APIs. Lambda functions are triggered by AWS services such as API Gateway.
Deploy API Stages: After your API is set up, you'll deploy it to a stage, which is a named reference to a deployment, which you use to access the deployed API.
Define Resources and Methods: Within the API Gateway, you define a set of resources and HTTP methods (like GET, POST, PUT, DELETE), which correspond to operations that clients can perform on the API.
Enable CORS: If your API is going to be called from web applications hosted on domains different from where your API is hosted, you'll need to enable Cross-Origin Resource Sharing (CORS).

Below is a Pulumi program written in Python that sets up serverless AI API endpoints using AWS API Gateway. This program illustrates how to create an API, integrate it with a Lambda function, and deploy it. To focus on the integration with API Gateway, I'm not including the Lambda function code, but in a real-world scenario, you would replace the placeholder with actual AI inference function code.

import pulumi
from pulumi_aws import apigatewayv2, lambda_, iam

# This lambda function represents the serverless AI inference function.
# In a real scenario, this would be the implementation of your AI logic.
# Please replace '<lambda-handler>' with your Lambda function's actual handler.
ai_handler = lambda_.Function("aiHandler",
                              role=iam.Role("lambdaRole",
                                            assume_role_policy="""{
                                              "Version": "2012-10-17",
                                              "Statement": [{
                                                "Action": "sts:AssumeRole",
                                                "Effect": "Allow",
                                                "Principal": {
                                                  "Service": "lambda.amazonaws.com"
                                                }
                                              }]
                                            }""").arn,
                              handler='<lambda-handler>',
                              runtime='python3.8',
                              code=pulumi.AssetArchive({
                                  '.': pulumi.FileArchive('./path-to-your/lambda-code')
                              }),
                              timeout=30)

# Create an HTTP API in API Gateway.
api = apigatewayv2.Api("httpApiGateway",
                       protocol_type="HTTP",
                       route_key="ANY /",  # This means that any HTTP method is routed to the Lambda function.
                       target=ai_handler.invoke_arn)

# Lambda permission to allow invocation from API Gateway.
invoke_permission = lambda_.Permission("invokePermission",
                                       action="lambda:InvokeFunction",
                                       function=ai_handler.name,
                                       principal="apigateway.amazonaws.com",
                                       source_arn=api.execution_arn)

# Define API integration, linking the API Gateway to the Lambda handler.
integration = apigatewayv2.Integration("apiIntegration",
                                       api_id=api.id,
                                       integration_type="AWS_PROXY",
                                       integration_uri=ai_handler.invoke_arn)

# Deploy the API to a stage.
stage = apigatewayv2.Stage("apiStage",
                           api_id=api.id,
                           name="v1",
                           route_settings=[{
                               "route_key": api.route_key,
                               "throttling_burst_limit": 5,
                               "throttling_rate_limit": 10
                           }],
                           autoscaling_settings=[{
                               "min_capacity": 1,
                               "max_capacity": 2
                           }])

# Export the HTTPS endpoint for the API Gateway.
pulumi.export('api_url', api.api_endpoint.apply(lambda endpoint: f"https://{endpoint}"))

Here's what's happening in this Pulumi program:

We create a Lambda function aiHandler which will serve our AI inference code. The iam.Role assigned to it specifies the permissions that the Lambda has, which in our case allows it to be invoked by AWS services.
We then create an HTTP API Gateway httpApiGateway with a route key that proxies any HTTP method to our Lambda.
The invokePermission grants API Gateway the necessary permissions to invoke our AI handler Lambda function.
apiIntegration connects the API Gateway with the Lambda function using AWS_PROXY, which allows for incoming requests to be directly passed to the Lambda and for the return value to flow back to the client.
The apiStage represents a logical stage of our API, reflecting a deployment snapshot. In this code, we've set up basic throttling to manage traffic.
Lastly, we expose the URL endpoint of our deployed API Gateway through Pulumi's export feature, which allows us to easily retrieve and use the endpoint after deployment.

You would need to replace placeholder parts (like <lambda-handler> and './path-to-your/lambda-code') with your actual Lambda function handler and the path to your code. Make sure to have the AWS Pulumi provider configured with the necessary credentials before running this program. The Lambda function can be any AI model that you have containerized in a Lambda-compatible format (e.g., a zipped Python package).

Remember that you may want to put additional measures in place depending on your requirements, such as API keys, more complex routing, and authorization mechanisms. The above program is a foundational starting point for a serverless architecture on AWS.