Real-time Data Processing for AI with AWS Lambda and S3

Question

Pulumi · Accepted Answer

Real-time data processing is a critical component of AI-driven applications, such as those performing analytics on streaming data or requiring immediate action based on the incoming data streams. AWS provides a robust stack of services that can help achieve real-time data processing, with AWS Lambda for compute and S3 for storage. In the context of Pulumi, you can use infrastructure-as-code to define and deploy these services programmatically.

Below, I will guide you through setting up a simple real-time data processing pipeline using AWS Lambda and S3 with Pulumi in Python. The basic idea is that files uploaded to an S3 bucket will trigger a Lambda function to process the data immediately.

We will create the following resources:

1. An S3 bucket: This is the storage where your files or streaming data will reside.
2. A Lambda function: This serverless compute service will process the data in real-time. It will be triggered whenever a new file is uploaded to the S3 bucket.
3. A Lambda permission: This grants the S3 service the necessary permission to invoke the Lambda function in response to events such as file uploads.

Let's look at how you can define this setup:

```python
import pulumi
import pulumi_aws as aws

# Create an S3 bucket to store files
s3_bucket = aws.s3.Bucket("data_bucket")

# Create an IAM role and policy to allow Lambda to access other AWS services
# The assume role policy allows Lambda services to assume this role
lambda_role = aws.iam.Role("lambda_role", 
    assume_role_policy="""{
        "Version": "2012-10-17",
        "Statement": [{
            "Action": "sts:AssumeRole",
            "Effect": "Allow",
            "Principal": {
                "Service": "lambda.amazonaws.com"
            }
        }]
    }"""
)

# Attach the AWS Lambda Basic Execution Role policy to the IAM role
lambda_exec_policy_attachment = aws.iam.RolePolicyAttachment("lambda_exec_policy_attachment",
    role=lambda_role.name,
    policy_arn="arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
)

# Create the Lambda function
lambda_function = aws.lambda_.Function("data_processor_function",
    runtime="python3.8", # Choose the runtime environment for Lambda
    code=pulumi.FileArchive("./function"), # Specify the path to the Lambda function code
    handler="handler.handler", # The function entrypoint in our code
    role=lambda_role.arn,
    timeout=300, # Maximum time that the function can run (in seconds)
    memory_size=128, # Set the memory allocated for the Lambda function
    # This is the environment variable that we can access inside our Lambda function
    environment=aws.lambda_.FunctionEnvironmentArgs(
        variables={
            "BUCKET_NAME": s3_bucket.bucket # Passes our S3 bucket name to the Lambda environment
        }
    ),
    opts=pulumi.ResourceOptions(depends_on=[lambda_exec_policy_attachment])
)

# Grant the S3 bucket permission to invoke the Lambda function
lambda_permission = aws.lambda_.Permission("lambda_permission",
    action="lambda:InvokeFunction",
    function=lambda_function.name,
    principal="s3.amazonaws.com",
    source_arn=s3_bucket.arn
)

# Event to link S3 object creation to Lambda function execution
s3_bucket_notification = aws.s3.BucketNotification("s3_bucket_notification",
    bucket=s3_bucket.id,
    lambda_functions=[aws.s3.BucketNotificationLambdaFunctionArgs(
        lambda_function_arn=lambda_function.arn,
        events=["s3:ObjectCreated:*"],
        filter_prefix="input/", # Only trigger for files uploaded to 'input/' directory
        filter_suffix=".json"  # Only trigger for files with '.json' extension
    )],
    opts=pulumi.ResourceOptions(depends_on=[lambda_permission])
)

# Export the name of the bucket and the Lambda function ARN
pulumi.export("bucket_name", s3_bucket.bucket)
pulumi.export("lambda_function_arn", lambda_function.arn)
```

Here’s a step-by-step walkthrough of the code:

1. **S3 Bucket**: We create an S3 bucket using `aws.s3.Bucket`. This is where the data files will be stored.
   
2. **IAM Role and Policy**: Before creating the Lambda function, we establish an IAM role (`aws.iam.Role`) with the necessary policies allowing it to execute and access other AWS services.

3. **Lambda Function**: We define the Lambda function using `aws.lambda_.Function`, specifying the runtime, code location, handler, associated role, and other configurations.

4. **Lambda Permission**: The `aws.lambda_.Permission` resource is then created to allow S3 services to invoke the Lambda function upon certain events, like the creation of an object.

5. **Bucket Notification**: Finally, `aws.s3.BucketNotification` links the S3 bucket events to the Lambda function. We specify that the function should be invoked whenever a new `.json` file is uploaded to the `input/` path of the bucket.

6. **Pulumi Exports**: We export the bucket name and Lambda function ARN for easy reference.

To execute this Pulumi program, you will need a directory with your Lambda code. For instance, if your Lambda function code is in `function/handler.py`, ensure it has the logic needed to process your data.

Please remember to replace `"./function"` with the actual file path of your Lambda code when deploying this stack. The Lambda function's directory should be placed at the same level as your `Pulumi.yaml` file or you should adjust the path accordingly.