How do I process data streams with Flink on EMR Serverless?
In this guide, we will set up a Flink application on AWS EMR Serverless using Pulumi. This setup will allow you to process data streams efficiently without managing the underlying infrastructure. We will create an EMR Serverless application, configure its capacity, and set up necessary network configurations.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
// Define the EMR Serverless application
const emrServerlessApp = new aws.emrserverless.Application("flinkApp", {
name: "flink-stream-processing",
type: "Flink",
releaseLabel: "emr-6.4.0",
maximumCapacity: {
cpu: "8vCPU",
memory: "32GB",
},
initialCapacities: [{
initialCapacityType: "worker",
initialCapacityConfig: {
workerCount: 2,
workerConfiguration: {
cpu: "4vCPU",
memory: "16GB",
},
},
}],
imageConfiguration: {
imageUri: "123456789012.dkr.ecr.us-west-2.amazonaws.com/my-flink-image:latest",
},
networkConfiguration: {
subnetIds: ["subnet-0bb1c79de3EXAMPLE"],
securityGroupIds: ["sg-0bb1c79de3EXAMPLE"],
},
autoStopConfiguration: {
enabled: true,
idleTimeoutMinutes: 15,
},
autoStartConfiguration: {
enabled: true,
},
});
// Export the application ID
export const applicationId = emrServerlessApp.id;
Key Points
- EMR Serverless Application: This resource defines the EMR Serverless application with Flink as the application type.
- Capacity Configuration: We configure the initial and maximum capacity for the application to ensure it can handle the required load.
- Image Configuration: We specify the Docker image for the Flink application.
- Network Configuration: Subnet and security group IDs are provided for network access.
- Auto Start/Stop Configuration: The application is set to automatically start and stop based on idle time.
Summary
We created an EMR Serverless application for Flink using Pulumi. The setup includes defining the application’s capacity, specifying the Docker image, configuring network settings, and enabling auto start/stop features. This setup allows you to process data streams efficiently on AWS without managing the underlying infrastructure.
Deploy this code
Want to deploy this code? Sign up for a free Pulumi account to deploy in a few clicks.
Sign upNew to Pulumi?
Want to deploy this code? Sign up with Pulumi to deploy in a few clicks.
Sign upThank you for your feedback!
If you have a question about how to use Pulumi, reach out in Community Slack.
Open an issue on GitHub to report a problem or suggest an improvement.