AWS Classic v5.28.0, Jan 23 23
ETL pipeline with Amazon Redshift and AWS Glue | Python
This example creates an ETL pipeline using Amazon Redshift and AWS Glue. The pipeline extracts data from an S3 bucket with a Glue crawler, transforms it with a Python script wrapped in a Glue job, and loads it into a Redshift database deployed in a VPC.
Prerequisites
- Install Pulumi.
- Install Python.
- Configure your AWS credentials.
Deploying the App
Clone this repo, change to this directory, then create a new stack for the project:
pulumi stack init
Specify an AWS region to deploy into:
pulumi config set aws:region us-west-2
Install Python dependencies and run Pulumi:
python3 -m venv venv source venv/bin/activate pip install -r requirements.txt pulumi up
In a few moments, the Redshift cluster and Glue components will be up and running and the S3 bucket name emitted as a Pulumi stack output.
... Outputs: dataBucketName: "events-56e424a"
Upload the included sample data file to S3 to verify the automation works as expected:
aws s3 cp events-1.txt s3://$(pulumi stack output dataBucketName)
When you’re ready, destroy your stack and remove it:
pulumi destroy --yes pulumi stack rm --yes