1. Intelligent Document Parsing with AWS Lambda triggered by S3


    Intelligent Document Parsing with AWS Lambda is a common cloud pattern where an S3 bucket is used to store documents, and a Lambda function is invoked to process the document once it's uploaded to the bucket. Document parsing typically involves extracting text, metadata, or specific data points from the documents using some form of processing logic implemented within the Lambda function.

    Here's how to achieve this with Pulumi in Python:

    1. Create an S3 Bucket: An AWS S3 bucket is required to store the documents that need to be parsed.

    2. Create a Lambda Function: The Lambda function will contain the code that parses the documents. It can be triggered by an event that signifies a new document's arrival in S3.

    3. Grant Permissions: The Lambda function must have the necessary permissions to read from the S3 bucket.

    4. Configure S3 Event Notifications: This involves setting up the S3 bucket to publish events to AWS Lambda, effectively triggering the function upon the desired S3 operation, such as when a new file is uploaded.

    Below is a complete Pulumi program in Python to set up this architecture:

    import pulumi import pulumi_aws as aws # Create an S3 bucket that will store the documents. bucket = aws.s3.Bucket("documentsBucket") # Upload an example document to S3. example_document = aws.s3.BucketObject( "exampleDocument", bucket=bucket.id, key="document-to-parse.pdf", source=pulumi.FileAsset("path-to-your-document.pdf") ) # IAM Role that your Lambda Function will use to get permissions to access S3 objects. lambda_role = aws.iam.Role("lambdaRole", assume_role_policy="""{ "Version": "2012-10-17", "Statement": [{ "Action": "sts:AssumeRole", "Principal": { "Service": "lambda.amazonaws.com" }, "Effect": "Allow", "Sid": "" }] }""" ) # Attach the AWS managed Lambda Basic Execution Role policy to the IAM role created above. role_policy_attachment = aws.iam.RolePolicyAttachment("lambdaRoleAttachment", role=lambda_role.name, policy_arn="arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" ) # Create the Lambda function. lambda_function = aws.lambda_.Function("documentParser", # Runtime: Python 3.8 is set here. You should use a runtime that is appropriate for your Lambda's implementation. runtime="python3.8", # Handler: The function within your code that Lambda calls to begin execution. handler="parser.handler", # Code: The path to the zipped code for your Lambda function. code=pulumi.FileArchive("./parser.zip"), # Role: The IAM role that Lambda assumes when it executes your function. role=lambda_role.arn, # Environment Variables: If your parser requires any environment settings, you would configure those here. environment=aws.lambda_.FunctionEnvironmentArgs(variables={ "S3_BUCKET_NAME": bucket.id, }) ) # Grant the Lambda permission to act on the S3 Bucket. lambda_permission = aws.lambda_.Permission("lambdaPermission", action="lambda:InvokeFunction", function=lambda_function.arn, principal="s3.amazonaws.com", source_arn=bucket.arn ) # Create a notification for the S3 bucket to trigger the Lambda function. bucket_notification = aws.s3.BucketNotification("bucketNotification", bucket=bucket.id, lambda_functions=[aws.s3.BucketNotificationLambdaFunctionArgs( lambda_function_arn=lambda_function.arn, events=["s3:ObjectCreated:*"], filter_prefix="documents/", filter_suffix=".pdf", )] ) # To avoid a race condition between the function and the bucket notification, # we use an explicit `depends_on` to ensure the Lambda permission is set before the notification is created. pulumi.ResourceOptions(depends_on=[lambda_permission]) # Export the S3 bucket name and Lambda function ARN pulumi.export("bucketName", bucket.id) pulumi.export("lambdaFunctionArn", lambda_function.arn)

    Explanation of the Pulumi program:

    • Defined an AWS S3 bucket to store documents. Here, we have even uploaded a sample document for the sake of demonstration.
    • Established an IAM Role with a trust relationship allowing the Lambda service to assume this role. We attached the AWS managed LambdaBasicExecutionRole policy to permit the function to log to CloudWatch.
    • Deployed an AWS Lambda function with a Python runtime and code uploaded as a zipped file archive. The function is configured with the necessary environment variables and uses the IAM role created.
    • Created a Lambda Permission that grants AWS S3 the ability to invoke the Lambda function.
    • Configured the S3 bucket to send notifications to the Lambda function when a new object is created matching the specified prefix and suffix. We use a filter for .pdf documents in a documents/ directory within the bucket.
    • The depends_on option ensures that the Lambda permission is created before S3 tries to send an event to the function, avoiding a potential race condition.
    • Exports provide outputs for the S3 bucket name and the Lambda function ARN, which can be used to identify these resources in AWS and Pulumi.

    What you should have ready before running this program:

    • You should have Pulumi installed and configured with an AWS account.
    • Create the parser.zip file, which should contain your Lambda function code along with any dependencies.
    • If your Lambda function requires additional permissions to interact with other AWS services, you will need to modify the IAM role and policy accordingly.
    • Your AWS region should support Lambda and S3, and the lambda role policy should be available in your region.