Posts Tagged redshift

Building an ETL pipeline with Amazon Redshift and AWS Glue

Building an ETL pipeline with Amazon Redshift and AWS Glue

In our last episode, Deploying a Data Warehouse with Pulumi and Amazon Redshift, we covered using Pulumi to load unstructured data from Amazon S3 into an Amazon Redshift cluster. That went well, but you may recall that at the end of that post, we were left with a few unanswered questions:

  • How do we avoid importing and processing the same data twice?
  • How can we transform the data during the ingestion process?
  • What are our options for loading data automatically — for example, on a regular schedule?

These are the kinds of questions you’ll almost always have when setting up a data-processing (or ETL) pipeline — and every platform tends to answer them a little differently.

Read more →

Deploying a Data Warehouse with Pulumi and Amazon Redshift

Deploying a Data Warehouse with Pulumi and Amazon Redshift

It’s fun to think about how much data there is swirling around in the global datasphere these days. However you choose to measure it (and there are various ways), it’s a quantity so massive — hundreds of zettabytes, by some estimates — that it’s kind of a hard thing to quite get your head around.

If you could convert all the world’s data into droplets of water, for instance, at one megabyte per drop, you’d have enough 1MB drops to fill two more Lake Washingtons. If you could store all that data on 3.5" floppies, you’d need more than a hundred quadrillion floppies to capture it all — enough to cover the planet entirely (with much room for overlap) or to pave a nice bridge for yourself from your front porch well into interstellar space. If you could pull all that data into an HD movie, and you sat down to start watching that movie 2.5 million years ago (with your favorite saber-toothed friend, say), you’d still be watching the same movie today.

Read more →

Using AWS Quick Starts with the Pulumi Registry

Using AWS Quick Starts with the Pulumi Registry

As somebody who works on AWS projects across numerous projects, teams, and industries; I see the following three common types of infrastructure problems. I think the Pulumi Registry provides an incredible solution to each of these problems and will fundamentally change how people interact with AWS.

Read more →