Building a document analytics system with AWS Textract, processing content using Lambda, and storing insights in Amazon RDS
PythonTo accomplish this goal, this Pulumi program will do the following:
- Create an S3 bucket for storing your document.
- Create a Lambda function that will be invoked every time a new document is uploaded to the S3 bucket. This function will call AWS Textract to extract the text from the document.
- The insights gained from Textract's analysis are updated to the RDS instance.
Here is a simplified demonstration of your Pulumi Program. You will need to manage permissions for the resources (e.g., allowing Lambda to call Textract and talk to S3 and RDS), which is not included in the snippet below.
import pulumi from pulumi_aws import s3, lambda_, iam, rds from pulumi import Output # Create an S3 bucket for your documents bucket = s3.Bucket('my-bucket') # Define the policy that allows AWS Lambda to use AWS Textract and S3 assume_role_policy = json.dumps({ "Version": "2012-10-17", "Statement": [ { "Action": "sts:AssumeRole", "Principal": { "Service": "lambda.amazonaws.com", }, "Effect": "Allow", }, ], }) role = iam.Role('lambda-role', assume_role_policy=assume_role_policy) # Attach the policy to the role iam.RolePolicyAttachment('lambda-s3-execution-role-policy', role=role.name, policy_arn='arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole') # Create the Lambda function bucket_object_key = 'text-extraction-lambda-handler.zip' # assumes you have this locally bucket_object = s3.BucketObject(bucket_object_key, bucket=bucket.id, source=pulumi.FileAsset(bucket_object_key), # assumes you have this locally opts=ResourceOptions(parent=bucket)) lambda_func = lambda_.Function('text-extractor', code=lambda_.s3.get_bucket_object(bucket=bucket.id, key=bucket_object.key), role=role.arn, handler='index.handler', runtime='python3.8') # Create the RDS instance - replace with your preference of instance type and password rds_instance = rds.Instance('my-rds-instance', engine='mysql', instance_class='db.t2.micro', allocated_storage=20, storage_type='gp2', username='admin', password='mypassword', skip_final_snapshot=True) # Ensure your Lambda function and RDS instance have the necessary permissions and networking setup.
This leverages the
pulumi_aws.s3
,pulumi_aws.lambda_
, andpulumi_aws.rds
modules. Please replace the lambda code section with the actual AWS Lambda function that uses Textract and connects to an RDS instance.(Note: It's crucial to handle connection management well for Lambda to RDS. You need to think about connection pooling if you would have many lambdas connecting parallelly to your RDS instance.)
Remember to replace your preferences for the RDS instance class, allocated storage, username, and password.
The resources generated by this program include:
- Amazon S3 bucket
- AWS IAM roles and policies
- AWS Lambda function
- AWS RDS instance
This is a starting point, and you can enhance and tweak this program based on your other requirements such as VPC setup, security group settings, RDS instance details, etc. Please ensure you have a good understanding of security best practices while configuring your resources.