How do I schedule AWS EMR Serverless jobs using AWS Scheduler?
In this guide, we will demonstrate how to schedule AWS EMR Serverless jobs using AWS Scheduler with Pulumi. We will create an EMR Serverless application and set up a scheduler to trigger the application at specified intervals.
Key Points
- AWS EMR Serverless Application: A serverless application that can run big data workloads.
- AWS Scheduler: A service to schedule tasks and automate workflows.
Steps
- Create an EMR Serverless Application: Define the application with necessary configurations.
- Set Up AWS Scheduler: Create a schedule to trigger the EMR Serverless application.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
// Create an EMR Serverless Application
const emrApp = new aws.emrserverless.Application("emrApp", {
name: "my-emr-serverless-app",
type: "SPARK", // Specify the type of application (e.g., SPARK, HIVE)
releaseLabel: "emr-6.4.0", // EMR release version
maximumCapacity: {
cpu: "4 vCPU",
memory: "16 GB",
},
initialCapacities: [{
initialCapacityType: "DRIVER",
initialCapacityConfig: {
workerCount: 1,
workerConfiguration: {
cpu: "2 vCPU",
memory: "8 GB",
},
},
}],
});
// Create an IAM Role for the Scheduler to trigger the EMR application
const schedulerRole = new aws.iam.Role("schedulerRole", {
assumeRolePolicy: aws.iam.assumeRolePolicyForPrincipal({ Service: "scheduler.amazonaws.com" }),
});
// Attach the necessary policies to the role
const schedulerRolePolicy = new aws.iam.RolePolicy("schedulerRolePolicy", {
role: schedulerRole.id,
policy: pulumi.output({
Version: "2012-10-17",
Statement: [{
Effect: "Allow",
Action: [
"emr:StartJobRun",
],
Resource: "*", // Adjust the resource as needed
}],
}),
});
// Create an AWS Scheduler Schedule
const schedule = new aws.scheduler.Schedule("emrSchedule", {
name: "my-emr-schedule",
scheduleExpression: "cron(0 12 * * ? *)", // Every day at 12 PM UTC
flexibleTimeWindow: {
mode: "OFF",
},
target: {
arn: emrApp.id, // ARN of the EMR Serverless application
roleArn: schedulerRole.arn,
input: JSON.stringify({
name: "my-emr-job",
executionRoleArn: schedulerRole.arn, // Role to execute the job
releaseLabel: "emr-6.4.0",
jobDriver: {
sparkSubmitJobDriver: {
entryPoint: "s3://my-bucket/my-script.py", // Replace with your script location
},
},
configurationOverrides: {
monitoringConfiguration: {
s3MonitoringConfiguration: {
logUri: "s3://my-bucket/logs/",
},
},
},
}),
},
});
Summary
In this guide, we created an AWS EMR Serverless application and set up an AWS Scheduler to trigger the application at specified intervals. The scheduler uses a cron expression to define the schedule and an IAM role to ensure the necessary permissions are in place.
By following this guide, you can automate the execution of your EMR Serverless jobs using AWS Scheduler and Pulumi.
Deploy this code
Want to deploy this code? Sign up for a free Pulumi account to deploy in a few clicks.
Sign upNew to Pulumi?
Want to deploy this code? Sign up with Pulumi to deploy in a few clicks.
Sign upThank you for your feedback!
If you have a question about how to use Pulumi, reach out in Community Slack.
Open an issue on GitHub to report a problem or suggest an improvement.