1. Answers
  2. Scheduling AWS EMR Serverless Jobs with Scheduler

How do I schedule AWS EMR Serverless jobs using AWS Scheduler?

In this guide, we will demonstrate how to schedule AWS EMR Serverless jobs using AWS Scheduler with Pulumi. We will create an EMR Serverless application and set up a scheduler to trigger the application at specified intervals.

Key Points

  • AWS EMR Serverless Application: A serverless application that can run big data workloads.
  • AWS Scheduler: A service to schedule tasks and automate workflows.

Steps

  1. Create an EMR Serverless Application: Define the application with necessary configurations.
  2. Set Up AWS Scheduler: Create a schedule to trigger the EMR Serverless application.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

// Create an EMR Serverless Application
const emrApp = new aws.emrserverless.Application("emrApp", {
    name: "my-emr-serverless-app",
    type: "SPARK", // Specify the type of application (e.g., SPARK, HIVE)
    releaseLabel: "emr-6.4.0", // EMR release version
    maximumCapacity: {
        cpu: "4 vCPU",
        memory: "16 GB",
    },
    initialCapacities: [{
        initialCapacityType: "DRIVER",
        initialCapacityConfig: {
            workerCount: 1,
            workerConfiguration: {
                cpu: "2 vCPU",
                memory: "8 GB",
            },
        },
    }],
});

// Create an IAM Role for the Scheduler to trigger the EMR application
const schedulerRole = new aws.iam.Role("schedulerRole", {
    assumeRolePolicy: aws.iam.assumeRolePolicyForPrincipal({ Service: "scheduler.amazonaws.com" }),
});

// Attach the necessary policies to the role
const schedulerRolePolicy = new aws.iam.RolePolicy("schedulerRolePolicy", {
    role: schedulerRole.id,
    policy: pulumi.output({
        Version: "2012-10-17",
        Statement: [{
            Effect: "Allow",
            Action: [
                "emr:StartJobRun",
            ],
            Resource: "*", // Adjust the resource as needed
        }],
    }),
});

// Create an AWS Scheduler Schedule
const schedule = new aws.scheduler.Schedule("emrSchedule", {
    name: "my-emr-schedule",
    scheduleExpression: "cron(0 12 * * ? *)", // Every day at 12 PM UTC
    flexibleTimeWindow: {
        mode: "OFF",
    },
    target: {
        arn: emrApp.id, // ARN of the EMR Serverless application
        roleArn: schedulerRole.arn,
        input: JSON.stringify({
            name: "my-emr-job",
            executionRoleArn: schedulerRole.arn, // Role to execute the job
            releaseLabel: "emr-6.4.0",
            jobDriver: {
                sparkSubmitJobDriver: {
                    entryPoint: "s3://my-bucket/my-script.py", // Replace with your script location
                },
            },
            configurationOverrides: {
                monitoringConfiguration: {
                    s3MonitoringConfiguration: {
                        logUri: "s3://my-bucket/logs/",
                    },
                },
            },
        }),
    },
});

Summary

In this guide, we created an AWS EMR Serverless application and set up an AWS Scheduler to trigger the application at specified intervals. The scheduler uses a cron expression to define the schedule and an IAM role to ensure the necessary permissions are in place.

By following this guide, you can automate the execution of your EMR Serverless jobs using AWS Scheduler and Pulumi.

Deploy this code

Want to deploy this code? Sign up for a free Pulumi account to deploy in a few clicks.

Sign up

New to Pulumi?

Want to deploy this code? Sign up with Pulumi to deploy in a few clicks.

Sign up