Danny Zalkind is the Senior Director of Infrastructure Engineering for Skai, an award-winning intelligent marketing platform. He brings his 15 years of experience of managing tech teams to his current role where he’s dedicated to allow Skai R&D to efficiently produce and serve software. You can find him on Linkedin. As Skai continues its journey towards fully migrating to the cloud using Pulumi, we’ve taken another large bite out of the migration pie, moving our most critical data to AWS on top of Amazon Keyspaces, an Apache Cassandra–compatible database service.
In our last episode, Deploying a Data Warehouse with Pulumi and Amazon Redshift, we covered using Pulumi to load unstructured data from Amazon S3 into an Amazon Redshift cluster. That went well, but you may recall that at the end of that post, we were left with a few unanswered questions: How do we avoid importing and processing the same data twice? How can we transform the data during the ingestion process?
It’s fun to think about how much data there is swirling around in the global datasphere these days. However you choose to measure it (and there are various ways), it’s a quantity so massive — hundreds of zettabytes, by some estimates — that it’s kind of a hard thing to quite get your head around. If you could convert all the world’s data into droplets of water, for instance, at one megabyte per drop, you’d have enough 1MB drops to fill two more Lake Washingtons.