Serverless AI Inference Storage on AWS S3

Question

Pulumi · Accepted Answer

In order to set up a serverless AI inference storage on AWS, we'll be using Amazon S3 for storing our models and data, AWS Lambda for running serverless inference code based on triggers (such as new data uploads), and Amazon SageMaker to train and deploy machine learning models. Here's how you can do it with Pulumi in Python:

First, we create an Amazon S3 bucket for storing model data and inference input/output. S3 is a highly scalable object storage service, perfect for this use case.

Next, we deploy an AWS Lambda function. A Lambda function is a piece of serverless computing that allows you to run code in response to events without managing servers. This function will be responsible for performing the inference by loading the model from S3 and applying it to incoming data.

Then, we'll refer to Amazon SageMaker, a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. We use it to create an inference endpoint that the Lambda function can call to perform the actual predictions.

Here's a Pulumi program that ties it all together:

```python
import pulumi
import pulumi_aws as aws

# Create an S3 bucket to store AI models and inference data.
ai_models_bucket = aws.s3.Bucket("aiModelsBucket")

# Create a SageMaker model. For simplicity, we're not defining the details of the model here.
# Normally, you would specify the Docker image containing model etc.
# Check out AWS documentation for more details: https://www.pulumi.com/registry/packages/aws/api-docs/sagemaker/model/
sagemaker_model = aws.sagemaker.Model("aiModel",
    execution_role_arn="arn:aws:iam::123456789012:role/service-role/AmazonSageMaker-ExecutionRole-20200101T000001",
    primary_container=aws.sagemaker.ModelPrimaryContainerArgs(
        image="123456789012.dkr.ecr.us-west-2.amazonaws.com/my-custom-image:latest",
        model_data_url=ai_models_bucket.bucket.apply(lambda name: f"s3://{name}/models/model.tar.gz")
    )
)

# Deploy a Lambda function that will handle the inference.
inference_lambda = aws.lambda_.Function("inferenceLambda",
    runtime="python3.8",
    role=aws_iam.Role("lambdaRole", assume_role_policy="""{
        "Version": "2012-10-17",
        "Statement": [{
            "Action": "sts:AssumeRole",
            "Principal": {
                "Service": "lambda.amazonaws.com"
            },
            "Effect": "Allow",
            "Sid": ""
        }]
    }""").arn,
    handler="handler.main",
    code=aws.s3.BucketObject("inferenceLambdaCode",
        bucket=ai_models_bucket.id,
        key="lambda_code.zip",
        source=pulumi.FileArchive("./lambda_code.zip")
    ).bucket.apply(lambda name: f"s3://{name}/lambda_code.zip")
)

# Grant the Lambda function access to the S3 bucket and SageMaker endpoint.
bucket_permissions = aws.s3.BucketPolicy("bucketPermissions",
    bucket=ai_models_bucket.id,
    policy=pulumi.Output.all(ai_models_bucket.id, inference_lambda.arn).apply(lambda args: json.dumps({
        "Version": "2012-10-17",
        "Statement": [{
            "Effect": "Allow",
            "Principal": {
                "AWS": args[1]
            },
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": f"arn:aws:s3:::{args[0]}/*"
        }]
    }))
)

# Export the names of the S3 bucket and Lambda function so we can easily identify them later.
pulumi.export('bucket_name', ai_models_bucket.id)
pulumi.export('lambda_arn', inference_lambda.arn)
```

This Pulumi program sets up the infrastructure needed for a serverless AI inference system on AWS. The model is assumed to be already trained and packaged into a Docker image, which is not detailed in this program but would be part of a comprehensive ML workflow. The Lambda code has been packaged into a zip file and uploaded to S3; this code needs to have the logic for running the inference using the SageMaker model.

Lastly, we export the bucket name and Lambda ARN so that they can be easily retrieved and used elsewhere, for example, in application code where you might be uploading data for inference.

By using Pulumi's desired state configuration, you are also ensuring that if the infrastructure needs to scale or change in the future, those changes will be as simple as modifying this code and rerunning it, which is one of the key benefits of using infrastructure as code (IaC).