ETL jobs using AWS DataBrew
PythonTo create ETL jobs using AWS DataBrew, we can make use of the
aws_native.databrew.Job
andaws_native.databrew.Recipe
resources. These resources from theaws_native
package allow you to control a DataBrew job and recipe, respectively.The
aws_native.databrew.Job
is used to define a DataBrew job that transforms and analyzes datasets. It is a core component of ETL (Extract, Transform, Load) workflows.A
aws_native.databrew.Recipe
is a set of steps to be performed on data by a job defined in AWS Glue DataBrew.Below is a simple Pulumi Program that demonstrates how to set up a AWS DataBrew job with a recipe:
import pulumi from pulumi_aws_native import databrew as _databrew # Create an AWS Glue DataBrew Recipe recipe = _databrew.Recipe("recipe", description="A sample recipe", steps=[ { "action": { "operation": "REPLACE_ALL_TEXT", "parameters": { "columnNames": [ "column1", "column2" ], "find": "find-text", "replaceWith": "replacement-text" } } }, { "action": { "operation": "REMOVE_DUPLICATE_ROWS", "parameters": { "targetColumnNames": [ "column1", "column2" ] } } } ] ) # Create an AWS Glue DataBrew job job = _databrew.Job("job", dataset_name="sample-dataset", type="PROFILE", role_arn="arn:aws:iam::account-id:role/role-name", recipe={ "name": recipe.name, }, outputs=[ { "compressionFormat": "GZIP", "format": "CSV", "location": { "bucket": "s3-output-bucket", "key": "output-directory/" } } ] ) # Export the names of the created resources pulumi.export("recipeName", recipe.name) pulumi.export("jobName", job.name)
Please note that you will need to replace
"sample-dataset"
,"arn:aws:iam::account-id:role/role-name"
, and"s3-output-bucket"
with your actual Dataset name, Role ARN, and Output S3 bucket name, respectively.For more information, see: