Using aws emr with serverless
TypeScriptTo use AWS EMR with Serverless, you will need to create an Amazon EMR Serverless application. Amazon EMR Serverless is an offering from AWS that makes it easy to run big data analytics applications using popular open source frameworks without managing clusters and servers. It automatically provisions and scales the resources needed to run your application.
Below, I'll walk you through the process of creating an EMR Serverless application using Pulumi and AWS. To start, we will define an
Application
resource using theaws.emrserverless
package.In the following TypeScript program, we'll set up an EMR Serverless application. For the purpose of this example, we'll assume you are running a Spark application, but you can change the application type to fit your specific use case (e.g., Hive, JupyterEnterpriseGateway, etc.).
import * as pulumi from "@pulumi/pulumi"; import * as aws from "@pulumi/aws"; // Create a Serverless Application with EMR on AWS. const sparkApplication = new aws.emrserverless.Application("mySparkApplication", { releaseLabel: "emr-6.6.0", type: "SPARK", name: "MySparkApplication", // Define the maximum and initial capacity for the application. maximumCapacity: { cpu: "4 vCPU", // Maximum CPU capacity memory: "16 GB", // Maximum memory disk: "50 GB" // Optional maximum disk capacity }, initialCapacities: [{ initialCapacityType: "SPARK_DRIVER", initialCapacityConfig: { workerCount: 1, // Initial number of workers workerConfiguration: { cpu: "2 vCPU", memory: "8 GB" } } }], // Networking setup, replace with your subnet and security group IDs networkConfiguration: { subnetIds: ["subnet-xxxxxxxxxxxxxxxxx"], securityGroupIds: ["sg-xxxxxxxxxxxxxxxxx"] }, // Automatically stop the application after an idle time (in minutes) autoStopConfiguration: { enabled: true, idleTimeoutMinutes: 15 }, // Optionally, automatically start the application autoStartConfiguration: { enabled: false }, // Image configuration for customization or consistent runtime environment imageConfiguration: { imageUri: "path-to-your-custom-emr-image" // Only specify if you're using a custom image }, // Tags to identify or manage the application in billing or other cloud management tools tags: { "env": "production", "owner": "your-name-or-team" } }); // To access the application, you might want to output some of its attributes. export const sparkApplicationId = sparkApplication.id;
This program does the following:
- It imports the necessary Pulumi and AWS modules.
- It creates an EMR Serverless application configured for Spark with required properties like the
releaseLabel
,type
,name
,maximumCapacity
,initialCapacities
,networkConfiguration
,autoStopConfiguration
,autoStartConfiguration
, and optional settings likeimageConfiguration
andtags
.
Remember to replace placeholder values such as
subnet-xxxxxxxxxxxxxxxxx
,sg-xxxxxxxxxxxxxxxxx
, andpath-to-your-custom-emr-image
with actual values that suit your AWS setup. If you are not using a custom image, you may omit theimageConfiguration
property.To deploy this program, you will need to have Pulumi installed and configured for AWS access. Here are the general steps to deploy it:
- Ensure AWS CLI is configured with the necessary credentials and default region.
- Install Pulumi CLI on your machine.
- Create a new directory for your project and switch to it.
- Inside the directory, run
pulumi new aws-typescript
to create a new Pulumi TypeScript project. - Replace the auto-generated
index.ts
file with the code provided above. - Finally, run
pulumi up
to preview and deploy the resources.
This Pulumi program will deploy an AWS EMR Serverless application that's ready to receive and process jobs using the Spark framework. You can interact with the application using the AWS SDK or CLI by using the application ID, which is exported at the end of the program.