1. Packages
  2. AWS Classic
  3. How-to Guides
  4. ETL pipeline with Amazon Redshift and AWS Glue

Try AWS Native preview for resources not in the classic version.

AWS Classic v6.24.0 published on Tuesday, Feb 27, 2024 by Pulumi

ETL pipeline with Amazon Redshift and AWS Glue

aws logo

Try AWS Native preview for resources not in the classic version.

AWS Classic v6.24.0 published on Tuesday, Feb 27, 2024 by Pulumi

    View Code

    Deploy with Pulumi

    This example creates an ETL pipeline using Amazon Redshift and AWS Glue. The pipeline extracts data from an S3 bucket with a Glue crawler, transforms it with a Python script wrapped in a Glue job, and loads it into a Redshift database deployed in a VPC.

    Prerequisites

    1. Install Pulumi.
    2. Install Python.
    3. Configure your AWS credentials.

    Deploying the App

    1. Clone this repo, change to this directory, then create a new stack for the project:

      pulumi stack init
      
    2. Specify an AWS region to deploy into:

      pulumi config set aws:region us-west-2
      
    3. Install Python dependencies and run Pulumi:

      python3 -m venv venv
      source venv/bin/activate
      pip install -r requirements.txt
      
      pulumi up
      
    4. In a few moments, the Redshift cluster and Glue components will be up and running and the S3 bucket name emitted as a Pulumi stack output.

      ...
      Outputs:
          dataBucketName: "events-56e424a"
      
    5. Upload the included sample data file to S3 to verify the automation works as expected:

      aws s3 cp events-1.txt s3://$(pulumi stack output dataBucketName)
      
    6. When you’re ready, destroy your stack and remove it:

      pulumi destroy --yes
      pulumi stack rm --yes
      
    aws logo

    Try AWS Native preview for resources not in the classic version.

    AWS Classic v6.24.0 published on Tuesday, Feb 27, 2024 by Pulumi