API Endpoints for Large Language Models with Service Gateway

Question

Pulumi · Accepted Answer

Creating API endpoints for large language models involves setting up a cloud service that can receive HTTP requests and forward them to the underlying model for processing. The architecture typically includes an API Gateway to manage the incoming requests, a load balancer to distribute the workload, and a set of compute instances where the language models are hosted.

A Service Gateway acts as a single entry point for managing all the services and routes traffic accordingly. It can help in scaling, securing, and managing the APIs. In this guide, I'll show you how we can create an API Gateway and a backend set up for serving a large language model using cloud services.

In the Pulumi program below, we will create an API Gateway on AWS using Amazon API Gateway, integrate it with AWS Lambda where our language model can reside, and create necessary roles and permissions. For simplicity, we'll assume that the language model is already set up to work with AWS Lambda.

```python
import pulumi
import pulumi_aws as aws

# Create a new role for the API Gateway so it can send requests to Lambda.
api_gateway_role = aws.iam.Role("apiGatewayRole",
    assume_role_policy="""{
        "Version": "2012-10-17",
        "Statement": [
            {
                "Action": "sts:AssumeRole",
                "Effect": "Allow",
                "Principal": {
                    "Service": "apigateway.amazonaws.com"
                }
            }
        ]
    }""")

# Attach a policy to the role we created above that allows invoking Lambdas.
policy = aws.iam.RolePolicy("apiGatewayLambdaPolicy",
    role=api_gateway_role.id,
    policy=pulumi.Output.all(api_gateway_role.arn).apply(lambda arn: f"""{{
        "Version": "2012-10-17",
        "Statement": [
            {{
                "Effect": "Allow",
                "Action": "lambda:InvokeFunction",
                "Resource": "{arn}"
            }}
        ]
    }}"""))

# Assume a pre-existing Lambda function is called 'LanguageModelLambda'.
# Fetch its ARN to integrate with API Gateway
language_model_lambda = aws.lambda_.get_function(name="LanguageModelLambda")

# Create a new REST API.
rest_api = aws.apigateway.RestApi("languageModelApi",
    description="API for Large Language Model",
    policy="""{
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": "*",
                "Action": "execute-api:Invoke",
                "Resource": "execute-api:/*/*/*"
            }
        ]
    }""")

# Create a resource to represent the path '/model'.
resource = aws.apigateway.Resource("modelResource",
    rest_api=rest_api.id,
    parent_id=rest_api.root_resource_id,
    path_part="model")

# Create a method for the '/model' resource. We use 'ANY' to handle any HTTP method.
method = aws.apigateway.Method("modelMethod",
    rest_api=rest_api.id,
    resource_id=resource.id,
    http_method="ANY",
    authorization="NONE")

# Create an integration to connect the API method to the Lambda function.
integration = aws.apigateway.Integration("lambdaIntegration",
    rest_api=rest_api.id,
    resource_id=resource.id,
    http_method=method.http_method,
    integration_http_method="POST", # Lambda expects POST requests.
    type="AWS_PROXY",
    uri=pulumi.Output.all(language_model_lambda.invoke_arn).apply(lambda arn: f"arn:aws:apigateway:{aws.config.region}:lambda:path/2015-03-31/functions/{arn}/invocations"))

# Create a deployment of the REST API.
deployment = aws.apigateway.Deployment("apiDeployment",
    rest_api=rest_api.id,
    # Ensure that any changes to methods or integrations prompt a new deployment.
    triggers={"redeployment": pulumi.Output.all(rest_api, method, integration).apply(lambda args: str(hash(args)))},
    lifecycle={
        "create_before_destroy": True,
    })

# Create a stage, which is a named reference to a deployment to access the API.
stage = aws.apigateway.Stage("apiStage",
    rest_api=rest_api.id,
    deployment=deployment.id,
    stage_name="v1")

# Export the invoke URL of the API so we can call it.
pulumi.export("invoke_url", pulumi.Output.concat(
    "https://", rest_api.id, ".execute-api.", aws.config.region, ".amazonaws.com/", stage.stage_name
))
```

This Pulumi program does the following:

1. **IAM Role and Policy**: It creates a new IAM role for the API Gateway and attaches a policy allowing it to invoke Lambda functions.

2. **Lambda Function**: It prepares to integrate a pre-existing Lambda function called 'LanguageModelLambda', which serves our large language model.

3. **API Gateway Setup**: An Amazon API Gateway 'REST API' is set up with a resource representing the path '/model' and a method to accept HTTP requests. An 'ANY' method is used here to allow different types of HTTP methods.

4. **API Gateway Integration**: Integrates the API Gateway with the Lambda function using the AWS_PROXY integration type. This allows the API Gateway to act as a proxy for the Lambda function.

5. **Deployment and Stage**: It deploys the API and sets it up with a stage named 'v1', which we can use to access our API.

6. **Export Invoke URL**: At the end, the program exports the invoke URL of our API. This is the URL that you can use to send requests to our large language model.

To use this code, replace the 'LanguageModelLambda' with the actual name of your Lambda function. You would also need to adjust the policies to match the actual resources in your AWS account. This is a basic setup to get started with exposing your large language model via an API Gateway. Additional steps such as adding authentication, rate limiting, and logging can also be configured based on your requirements.