ETL pipeline with Amazon Redshift and AWS Glue
This example creates an ETL pipeline using Amazon Redshift and AWS Glue. The pipeline extracts data from an S3 bucket with a Glue crawler, transforms it with a Python script wrapped in a Glue job, and loads it into a Redshift database deployed in a VPC.
Deploying the App
Clone this repo, change to this directory, then create a new stack for the project:
pulumi stack init
Specify an AWS region to deploy into:
pulumi config set aws:region us-west-2
Install Node dependencies and run Pulumi:
npm install pulumi up
In a few moments, the Redshift cluster and Glue components will be up and running and the S3 bucket name emitted as a Pulumi stack output.
... Outputs: dataBucketName: "events-56e424a"
Upload the included sample data file to S3 to verify the automation works as expected:
aws s3 cp events-1.txt s3://$(pulumi stack output dataBucketName)
When you’re ready, destroy your stack and remove it:
pulumi destroy --yes pulumi stack rm --yes