1. Answers
  2. Setting Up AWS Glue with Parameters

How do I set up AWS Glue with parameters?

Setting Up AWS Glue with Parameters

In this guide, we’ll create an AWS Glue job with parameters to enhance your ETL (Extract, Transform, Load) processes. AWS Glue jobs can be parameterized to increase flexibility and reusability of your ETL scripts. We’ll define an AWS Glue job, a Glue database, a Glue crawler, and add parameters to the job definition.

Key Components

  • AWS Glue Database: A place to organize your data in AWS Glue.
  • AWS Glue Crawler: Automatically updates the metadata catalog with schema details.
  • AWS Glue Job: Runs your ETL scripts.
  • Parameters: Allow customization and reuse of scripts.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

const example = new aws.glue.CatalogDatabase("example", {name: "example_database"});
const exampleCrawler = new aws.glue.Crawler("example", {
    name: "example_crawler",
    role: "arn:aws:iam::123456789012:role/service-role/AWSGlueServiceRole",
    databaseName: example.name,
    s3Targets: [{
        path: "s3://example-bucket/path/",
    }],
});
const exampleJob = new aws.glue.Job("example", {
    name: "example_job",
    roleArn: "arn:aws:iam::123456789012:role/service-role/AWSGlueServiceRole",
    command: {
        scriptLocation: "s3://example-bucket/scripts/example-script.py",
        name: "glueetl",
    },
    defaultArguments: {
        "--job-language": "python",
        "--TempDir": "s3://example-bucket/temp/",
        "--parameter1": "value1",
    },
    maxRetries: 3,
    glueVersion: "2.0",
    numberOfWorkers: 10,
    workerType: "G.1X",
});
export const glueCrawlerName = exampleCrawler.name;
export const glueJobName = exampleJob.name;

Summary

This configuration sets up an AWS Glue environment with a database, crawler, and job. The Glue job is parameterized for flexibility. Using parameters in your Glue jobs makes it easy to run the same job with different inputs, enhancing reusability and maintainability of your ETL processes.

Key Points:

  • Defined an AWS Glue database to organize your metadata.
  • Created an AWS Glue crawler to discover the schema automatically.
  • Set up an AWS Glue job with user-defined parameters for flexibility.
  • Detailed outputs to identify Glue job and crawler by their names.

Deploy this code

Want to deploy this code? Sign up for a free Pulumi account to deploy in a few clicks.

Sign up

New to Pulumi?

Want to deploy this code? Sign up with Pulumi to deploy in a few clicks.

Sign up