aws logo
AWS Classic v5.28.0, Jan 23 23

ETL pipeline with Amazon Redshift and AWS Glue | Python

View Code

Deploy with Pulumi

This example creates an ETL pipeline using Amazon Redshift and AWS Glue. The pipeline extracts data from an S3 bucket with a Glue crawler, transforms it with a Python script wrapped in a Glue job, and loads it into a Redshift database deployed in a VPC.

Prerequisites

  1. Install Pulumi.
  2. Install Python.
  3. Configure your AWS credentials.

Deploying the App

  1. Clone this repo, change to this directory, then create a new stack for the project:

    pulumi stack init
    
  2. Specify an AWS region to deploy into:

    pulumi config set aws:region us-west-2
    
  3. Install Python dependencies and run Pulumi:

    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt
    
    pulumi up
    
  4. In a few moments, the Redshift cluster and Glue components will be up and running and the S3 bucket name emitted as a Pulumi stack output.

    ...
    Outputs:
        dataBucketName: "events-56e424a"
    
  5. Upload the included sample data file to S3 to verify the automation works as expected:

    aws s3 cp events-1.txt s3://$(pulumi stack output dataBucketName)
    
  6. When you’re ready, destroy your stack and remove it:

    pulumi destroy --yes
    pulumi stack rm --yes