1. Answers
  2. Creating AWS Glue Spark ETL Jobs Using Terraform

How do I create AWS Glue Spark ETL jobs using Pulumi?

To set up AWS Glue Spark ETL jobs, you’ll need to define several AWS resources such as an IAM role, security policies, a Glue database, and the Glue job itself. Here’s a step-by-step guide on how to achieve this:

  1. IAM Role and Policy: Glue requires an IAM role with the necessary permissions to access resources such as S3 buckets and other data stores.
  2. Glue Database: This organizes the data in AWS Glue.
  3. Glue Job: This specifies the ETL job script and its properties.

Here’s a complete example:

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

// IAM Role for AWS Glue
const glueRole = new aws.iam.Role("glue_role", {
    name: "glue-role",
    assumeRolePolicy: JSON.stringify({
        Version: "2012-10-17",
        Statement: [{
            Effect: "Allow",
            Principal: {
                Service: "glue.amazonaws.com",
            },
            Action: "sts:AssumeRole",
        }],
    }),
});
// IAM Policy for Glue
const gluePolicy = new aws.iam.RolePolicy("glue_policy", {
    role: glueRole.id,
    policy: JSON.stringify({
        Version: "2012-10-17",
        Statement: [
            {
                Effect: "Allow",
                Action: [
                    "s3:ListBucket",
                    "s3:GetObject",
                    "s3:PutObject",
                ],
                Resource: [
                    "arn:aws:s3:::my-bucket",
                    "arn:aws:s3:::my-bucket/*",
                ],
            },
            {
                Effect: "Allow",
                Action: ["logs:*"],
                Resource: "arn:aws:logs:*:*:*",
            },
        ],
    }),
});
// Glue Database
const glueDatabase = new aws.glue.CatalogDatabase("glue_database", {name: "my_glue_database"});
// Glue ETL Job
const glueJob = new aws.glue.Job("glue_job", {
    name: "my_etl_job",
    roleArn: glueRole.arn,
    command: {
        name: "glueetl",
        scriptLocation: "s3://my-bucket/scripts/my-etl-script.py",
        pythonVersion: "3",
    },
    glueVersion: "3.0",
    maxCapacity: 10,
    timeout: 60,
});
export const glueRoleArn = glueRole.arn;
export const glueDatabaseName = glueDatabase.name;
export const glueJobName = glueJob.name;

This program sets up the necessary IAM role with the required permissions, creates a Glue database, and defines the Glue Spark ETL job. Finally, it outputs the ARNs and names of the created resources, which you can use to track and manage your Glue jobs.

In summary, you’ve created an AWS Glue Spark ETL job configuration complete with IAM roles, required policies, and fundamental resources using an infrastructure as code approach.

Deploy this code

Want to deploy this code? Sign up for a free Pulumi account to deploy in a few clicks.

Sign up

New to Pulumi?

Want to deploy this code? Sign up with Pulumi to deploy in a few clicks.

Sign up