This page documents the language specification for the aws package. If you're looking for help working with the inputs, outputs, or functions of aws resources in a Pulumi program, please see the resource documentation for examples and API reference.

emr

This provider is a derived work of the Terraform Provider distributed under MPL 2.0. If you encounter a bug or missing feature, first check the pulumi/pulumi-aws repo; however, if that doesn’t turn up anything, please consult the source terraform-providers/terraform-provider-aws repo.

class pulumi_aws.emr.Cluster(resource_name, opts=None, additional_info=None, applications=None, autoscaling_role=None, bootstrap_actions=None, configurations=None, configurations_json=None, core_instance_count=None, core_instance_group=None, core_instance_type=None, custom_ami_id=None, ebs_root_volume_size=None, ec2_attributes=None, instance_groups=None, keep_job_flow_alive_when_no_steps=None, kerberos_attributes=None, log_uri=None, master_instance_group=None, master_instance_type=None, name=None, release_label=None, scale_down_behavior=None, security_configuration=None, service_role=None, step_concurrency_level=None, steps=None, tags=None, termination_protection=None, visible_to_all_users=None, __props__=None, __name__=None, __opts__=None)

Provides an Elastic MapReduce Cluster, a web service that makes it easy to process large amounts of data efficiently. See Amazon Elastic MapReduce Documentation for more information.

To configure Instance Groups for task nodes, see the emr.InstanceGroup resource.

Support for Instance Fleets will be made available in an upcoming release.

import pulumi
import pulumi_aws as aws

cluster = aws.emr.Cluster("cluster",
    additional_info="""{
  "instanceAwsClientConfiguration": {
    "proxyPort": 8099,
    "proxyHost": "myproxy.example.com"
  }
}

""",
    applications=["Spark"],
    bootstrap_actions=[{
        "args": [
            "instance.isMaster=true",
            "echo running on master node",
        ],
        "name": "runif",
        "path": "s3://elasticmapreduce/bootstrap-actions/run-if",
    }],
    configurations_json="""  [
    {
      "Classification": "hadoop-env",
      "Configurations": [
        {
          "Classification": "export",
          "Properties": {
            "JAVA_HOME": "/usr/lib/jvm/java-1.8.0"
          }
        }
      ],
      "Properties": {}
    },
    {
      "Classification": "spark-env",
      "Configurations": [
        {
          "Classification": "export",
          "Properties": {
            "JAVA_HOME": "/usr/lib/jvm/java-1.8.0"
          }
        }
      ],
      "Properties": {}
    }
  ]

""",
    core_instance_group={
        "autoscaling_policy": """{
"Constraints": {
  "MinCapacity": 1,
  "MaxCapacity": 2
},
"Rules": [
  {
    "Name": "ScaleOutMemoryPercentage",
    "Description": "Scale out if YARNMemoryAvailablePercentage is less than 15",
    "Action": {
      "SimpleScalingPolicyConfiguration": {
        "AdjustmentType": "CHANGE_IN_CAPACITY",
        "ScalingAdjustment": 1,
        "CoolDown": 300
      }
    },
    "Trigger": {
      "CloudWatchAlarmDefinition": {
        "ComparisonOperator": "LESS_THAN",
        "EvaluationPeriods": 1,
        "MetricName": "YARNMemoryAvailablePercentage",
        "Namespace": "AWS/ElasticMapReduce",
        "Period": 300,
        "Statistic": "AVERAGE",
        "Threshold": 15.0,
        "Unit": "PERCENT"
      }
    }
  }
]
}

""",
        "bid_price": "0.30",
        "ebsConfig": [{
            "size": "40",
            "type": "gp2",
            "volumesPerInstance": 1,
        }],
        "instance_count": 1,
        "instance_type": "c4.large",
    },
    ebs_root_volume_size=100,
    ec2_attributes={
        "emrManagedMasterSecurityGroup": aws_security_group["sg"]["id"],
        "emrManagedSlaveSecurityGroup": aws_security_group["sg"]["id"],
        "instanceProfile": aws_iam_instance_profile["emr_profile"]["arn"],
        "subnet_id": aws_subnet["main"]["id"],
    },
    keep_job_flow_alive_when_no_steps=True,
    master_instance_group={
        "instance_type": "m4.large",
    },
    release_label="emr-4.6.0",
    service_role=aws_iam_role["iam_emr_service_role"]["arn"],
    tags={
        "env": "env",
        "role": "rolename",
    },
    termination_protection=False)
import pulumi
import pulumi_aws as aws

example = aws.emr.Cluster("example",
    lifecycle={
        "ignoreChanges": [
            "stepConcurrencyLevel",
            "steps",
        ],
    },
    steps=[{
        "actionOnFailure": "TERMINATE_CLUSTER",
        "hadoopJarStep": {
            "args": ["state-pusher-script"],
            "jar": "command-runner.jar",
        },
        "name": "Setup Hadoop Debugging",
    }])
import pulumi
import pulumi_aws as aws

# Map public IP on launch must be enabled for public (Internet accessible) subnets
example_subnet = aws.ec2.Subnet("exampleSubnet", map_public_ip_on_launch=True)
example_cluster = aws.emr.Cluster("exampleCluster",
    core_instance_group={},
    ec2_attributes={
        "subnet_id": example_subnet.id,
    },
    master_instance_group={
        "instance_count": 3,
    },
    release_label="emr-5.24.1",
    termination_protection=True)

NOTE: This configuration demonstrates a minimal configuration needed to boot an example EMR Cluster. It is not meant to display best practices. Please use at your own risk.

import pulumi
import pulumi_aws as aws

main_vpc = aws.ec2.Vpc("mainVpc",
    cidr_block="168.31.0.0/16",
    enable_dns_hostnames=True,
    tags={
        "name": "emr_test",
    })
main_subnet = aws.ec2.Subnet("mainSubnet",
    vpc_id=main_vpc.id,
    cidr_block="168.31.0.0/20",
    tags={
        "name": "emr_test",
    })
# IAM role for EMR Service
iam_emr_service_role = aws.iam.Role("iamEmrServiceRole", assume_role_policy="""{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "elasticmapreduce.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
""")
# IAM Role for EC2 Instance Profile
iam_emr_profile_role = aws.iam.Role("iamEmrProfileRole", assume_role_policy="""{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
""")
emr_profile = aws.iam.InstanceProfile("emrProfile", roles=[iam_emr_profile_role.name])
cluster = aws.emr.Cluster("cluster",
    release_label="emr-4.6.0",
    applications=["Spark"],
    ec2_attributes={
        "subnet_id": main_subnet.id,
        "emrManagedMasterSecurityGroup": aws_security_group["allow_all"]["id"],
        "emrManagedSlaveSecurityGroup": aws_security_group["allow_all"]["id"],
        "instanceProfile": emr_profile.arn,
    },
    master_instance_type="m5.xlarge",
    core_instance_type="m5.xlarge",
    core_instance_count=1,
    tags={
        "role": "rolename",
        "dns_zone": "env_zone",
        "env": "env",
        "name": "name-env",
    },
    bootstrap_action=[{
        "path": "s3://elasticmapreduce/bootstrap-actions/run-if",
        "name": "runif",
        "args": [
            "instance.isMaster=true",
            "echo running on master node",
        ],
    }],
    configurations_json="""  [
    {
      "Classification": "hadoop-env",
      "Configurations": [
        {
          "Classification": "export",
          "Properties": {
            "JAVA_HOME": "/usr/lib/jvm/java-1.8.0"
          }
        }
      ],
      "Properties": {}
    },
    {
      "Classification": "spark-env",
      "Configurations": [
        {
          "Classification": "export",
          "Properties": {
            "JAVA_HOME": "/usr/lib/jvm/java-1.8.0"
          }
        }
      ],
      "Properties": {}
    }
  ]
""",
    service_role=iam_emr_service_role.arn)
allow_access = aws.ec2.SecurityGroup("allowAccess",
    description="Allow inbound traffic",
    vpc_id=main_vpc.id,
    ingress=[{
        "from_port": 0,
        "to_port": 0,
        "protocol": "-1",
        "cidr_blocks": main_vpc.cidr_block,
    }],
    egress=[{
        "from_port": 0,
        "to_port": 0,
        "protocol": "-1",
        "cidr_blocks": ["0.0.0.0/0"],
    }],
    tags={
        "name": "emr_test",
    })
gw = aws.ec2.InternetGateway("gw", vpc_id=main_vpc.id)
route_table = aws.ec2.RouteTable("routeTable",
    vpc_id=main_vpc.id,
    route=[{
        "cidr_block": "0.0.0.0/0",
        "gateway_id": gw.id,
    }])
main_route_table_association = aws.ec2.MainRouteTableAssociation("mainRouteTableAssociation",
    vpc_id=main_vpc.id,
    route_table_id=route_table.id)
###
iam_emr_service_policy = aws.iam.RolePolicy("iamEmrServicePolicy",
    role=iam_emr_service_role.id,
    policy="""{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Resource": "*",
        "Action": [
            "ec2:AuthorizeSecurityGroupEgress",
            "ec2:AuthorizeSecurityGroupIngress",
            "ec2:CancelSpotInstanceRequests",
            "ec2:CreateNetworkInterface",
            "ec2:CreateSecurityGroup",
            "ec2:CreateTags",
            "ec2:DeleteNetworkInterface",
            "ec2:DeleteSecurityGroup",
            "ec2:DeleteTags",
            "ec2:DescribeAvailabilityZones",
            "ec2:DescribeAccountAttributes",
            "ec2:DescribeDhcpOptions",
            "ec2:DescribeInstanceStatus",
            "ec2:DescribeInstances",
            "ec2:DescribeKeyPairs",
            "ec2:DescribeNetworkAcls",
            "ec2:DescribeNetworkInterfaces",
            "ec2:DescribePrefixLists",
            "ec2:DescribeRouteTables",
            "ec2:DescribeSecurityGroups",
            "ec2:DescribeSpotInstanceRequests",
            "ec2:DescribeSpotPriceHistory",
            "ec2:DescribeSubnets",
            "ec2:DescribeVpcAttribute",
            "ec2:DescribeVpcEndpoints",
            "ec2:DescribeVpcEndpointServices",
            "ec2:DescribeVpcs",
            "ec2:DetachNetworkInterface",
            "ec2:ModifyImageAttribute",
            "ec2:ModifyInstanceAttribute",
            "ec2:RequestSpotInstances",
            "ec2:RevokeSecurityGroupEgress",
            "ec2:RunInstances",
            "ec2:TerminateInstances",
            "ec2:DeleteVolume",
            "ec2:DescribeVolumeStatus",
            "ec2:DescribeVolumes",
            "ec2:DetachVolume",
            "iam:GetRole",
            "iam:GetRolePolicy",
            "iam:ListInstanceProfiles",
            "iam:ListRolePolicies",
            "iam:PassRole",
            "s3:CreateBucket",
            "s3:Get*",
            "s3:List*",
            "sdb:BatchPutAttributes",
            "sdb:Select",
            "sqs:CreateQueue",
            "sqs:Delete*",
            "sqs:GetQueue*",
            "sqs:PurgeQueue",
            "sqs:ReceiveMessage"
        ]
    }]
}
""")
iam_emr_profile_policy = aws.iam.RolePolicy("iamEmrProfilePolicy",
    role=iam_emr_profile_role.id,
    policy="""{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Resource": "*",
        "Action": [
            "cloudwatch:*",
            "dynamodb:*",
            "ec2:Describe*",
            "elasticmapreduce:Describe*",
            "elasticmapreduce:ListBootstrapActions",
            "elasticmapreduce:ListClusters",
            "elasticmapreduce:ListInstanceGroups",
            "elasticmapreduce:ListInstances",
            "elasticmapreduce:ListSteps",
            "kinesis:CreateStream",
            "kinesis:DeleteStream",
            "kinesis:DescribeStream",
            "kinesis:GetRecords",
            "kinesis:GetShardIterator",
            "kinesis:MergeShards",
            "kinesis:PutRecord",
            "kinesis:SplitShard",
            "rds:Describe*",
            "s3:*",
            "sdb:*",
            "sns:*",
            "sqs:*"
        ]
    }]
}
""")
Parameters
  • resource_name (str) – The name of the resource.

  • opts (pulumi.ResourceOptions) – Options for the resource.

  • additional_info (pulumi.Input[str]) – A JSON string for selecting additional features such as adding proxy information. Note: Currently there is no API to retrieve the value of this argument after EMR cluster creation from provider, therefore this provider cannot detect drift from the actual EMR cluster if its value is changed outside this provider.

  • applications (pulumi.Input[list]) – A list of applications for the cluster. Valid values are: Flink, Hadoop, Hive, Mahout, Pig, Spark, and JupyterHub (as of EMR 5.14.0). Case insensitive

  • autoscaling_role (pulumi.Input[str]) – An IAM role for automatic scaling policies. The IAM role provides permissions that the automatic scaling feature requires to launch and terminate EC2 instances in an instance group.

  • bootstrap_actions (pulumi.Input[list]) – Ordered list of bootstrap actions that will be run before Hadoop is started on the cluster nodes. Defined below.

  • configurations (pulumi.Input[str]) – List of configurations supplied for the EMR cluster you are creating

  • configurations_json (pulumi.Input[str]) – A JSON string for supplying list of configurations for the EMR cluster.

  • core_instance_count (pulumi.Input[float]) – Use the core_instance_group configuration block instance_count argument instead. Number of Amazon EC2 instances used to execute the job flow. EMR will use one node as the cluster’s master node and use the remainder of the nodes (core_instance_count-1) as core nodes. Cannot be specified if core_instance_group or instance_group configuration blocks are set. Default 1

  • core_instance_group (pulumi.Input[dict]) – Configuration block to use an Instance Group for the core node type. Cannot be specified if core_instance_count argument, core_instance_type argument, or instance_group configuration blocks are set. Detailed below.

  • core_instance_type (pulumi.Input[str]) – Use the core_instance_group configuration block instance_type argument instead. The EC2 instance type of the slave nodes. Cannot be specified if core_instance_group or instance_group configuration blocks are set.

  • custom_ami_id (pulumi.Input[str]) – A custom Amazon Linux AMI for the cluster (instead of an EMR-owned AMI). Available in Amazon EMR version 5.7.0 and later.

  • ebs_root_volume_size (pulumi.Input[float]) – Size in GiB of the EBS root device volume of the Linux AMI that is used for each EC2 instance. Available in Amazon EMR version 4.x and later.

  • ec2_attributes (pulumi.Input[dict]) – Attributes for the EC2 instances running the job flow. Defined below

  • instance_groups (pulumi.Input[list]) – Use the master_instance_group configuration block, core_instance_group configuration block and emr.InstanceGroup resource(s) instead. A list of instance_group objects for each instance group in the cluster. Exactly one of master_instance_type and instance_group must be specified. If instance_group is set, then it must contain a configuration block for at least the MASTER instance group type (as well as any additional instance groups). Cannot be specified if master_instance_group or core_instance_group configuration blocks are set. Defined below

  • keep_job_flow_alive_when_no_steps (pulumi.Input[bool]) – Switch on/off run cluster with no steps or when all steps are complete (default is on)

  • kerberos_attributes (pulumi.Input[dict]) – Kerberos configuration for the cluster. Defined below

  • log_uri (pulumi.Input[str]) – S3 bucket to write the log files of the job flow. If a value is not provided, logs are not created

  • master_instance_group (pulumi.Input[dict]) –

    Configuration block to use an Instance Group for the master node type. Cannot be specified if master_instance_type argument or instance_group configuration blocks are set. Detailed below.

  • master_instance_type (pulumi.Input[str]) – Use the master_instance_group configuration block instance_type argument instead. The EC2 instance type of the master node. Cannot be specified if master_instance_group or instance_group configuration blocks are set.

  • name (pulumi.Input[str]) – The name of the step.

  • release_label (pulumi.Input[str]) – The release label for the Amazon EMR release

  • scale_down_behavior (pulumi.Input[str]) – The way that individual Amazon EC2 instances terminate when an automatic scale-in activity occurs or an instance group is resized.

  • security_configuration (pulumi.Input[str]) – The security configuration name to attach to the EMR cluster. Only valid for EMR clusters with release_label 4.8.0 or greater

  • service_role (pulumi.Input[str]) – IAM role that will be assumed by the Amazon EMR service to access AWS resources

  • step_concurrency_level (pulumi.Input[float]) – The number of steps that can be executed concurrently. You can specify a maximum of 256 steps. Only valid for EMR clusters with release_label 5.28.0 or greater. (default is 1)

  • steps (pulumi.Input[list]) – List of steps to run when creating the cluster. Defined below. It is highly recommended to utilize ``ignoreChanges` <https://www.pulumi.com/docs/intro/concepts/programming-model/#ignorechanges>`_ if other steps are being managed outside of this provider.

  • tags (pulumi.Input[dict]) – list of tags to apply to the EMR Cluster

  • termination_protection (pulumi.Input[bool]) – Switch on/off termination protection (default is false, except when using multiple master nodes). Before attempting to destroy the resource when termination protection is enabled, this configuration must be applied with its value set to false.

  • visible_to_all_users (pulumi.Input[bool]) – Whether the job flow is visible to all IAM users of the AWS account associated with the job flow. Default true

The bootstrap_actions object supports the following:

  • args (pulumi.Input[list]) - List of command line arguments passed to the JAR file’s main function when executed.

  • name (pulumi.Input[str]) - The name of the step.

  • path (pulumi.Input[str]) - Location of the script to run during a bootstrap action. Can be either a location in Amazon S3 or on a local file system

The core_instance_group object supports the following:

  • autoscaling_policy (pulumi.Input[str]) - The autoscaling policy document. This is a JSON formatted string. See EMR Auto Scaling

  • bid_price (pulumi.Input[str]) - Bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances.

  • ebs_configs (pulumi.Input[list]) - Configuration block(s) for EBS volumes attached to each instance in the instance group. Detailed below.

    • iops (pulumi.Input[float]) - The number of I/O operations per second (IOPS) that the volume supports

    • size (pulumi.Input[float]) - The volume size, in gibibytes (GiB).

    • type (pulumi.Input[str]) - The volume type. Valid options are gp2, io1, standard and st1. See EBS Volume Types.

    • volumesPerInstance (pulumi.Input[float]) - The number of EBS volumes with this configuration to attach to each EC2 instance in the instance group (default is 1)

  • id (pulumi.Input[str]) - The ID of the EMR Cluster

  • instance_count (pulumi.Input[float]) - Target number of instances for the instance group. Must be 1 or 3. Defaults to 1. Launching with multiple master nodes is only supported in EMR version 5.23.0+, and requires this resource’s core_instance_group to be configured. Public (Internet accessible) instances must be created in VPC subnets that have map public IP on launch enabled. Termination protection is automatically enabled when launched with multiple master nodes and this provider must have the termination_protection = false configuration applied before destroying this resource.

  • instance_type (pulumi.Input[str]) - EC2 instance type for all instances in the instance group.

  • name (pulumi.Input[str]) - The name of the step.

The ec2_attributes object supports the following:

  • additionalMasterSecurityGroups (pulumi.Input[str]) - String containing a comma separated list of additional Amazon EC2 security group IDs for the master node

  • additionalSlaveSecurityGroups (pulumi.Input[str]) - String containing a comma separated list of additional Amazon EC2 security group IDs for the slave nodes as a comma separated string

  • emrManagedMasterSecurityGroup (pulumi.Input[str]) - Identifier of the Amazon EC2 EMR-Managed security group for the master node

  • emrManagedSlaveSecurityGroup (pulumi.Input[str]) - Identifier of the Amazon EC2 EMR-Managed security group for the slave nodes

  • instanceProfile (pulumi.Input[str]) - Instance Profile for EC2 instances of the cluster assume this role

  • key_name (pulumi.Input[str]) - Amazon EC2 key pair that can be used to ssh to the master node as the user called hadoop

  • serviceAccessSecurityGroup (pulumi.Input[str]) - Identifier of the Amazon EC2 service-access security group - required when the cluster runs on a private subnet

  • subnet_id (pulumi.Input[str]) - VPC subnet id where you want the job flow to launch. Cannot specify the cc1.4xlarge instance type for nodes of a job flow launched in a Amazon VPC

The instance_groups object supports the following:

  • autoscaling_policy (pulumi.Input[str]) - The autoscaling policy document. This is a JSON formatted string. See EMR Auto Scaling

  • bid_price (pulumi.Input[str]) - Bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances.

  • ebs_configs (pulumi.Input[list]) - Configuration block(s) for EBS volumes attached to each instance in the instance group. Detailed below.

    • iops (pulumi.Input[float]) - The number of I/O operations per second (IOPS) that the volume supports

    • size (pulumi.Input[float]) - The volume size, in gibibytes (GiB).

    • type (pulumi.Input[str]) - The volume type. Valid options are gp2, io1, standard and st1. See EBS Volume Types.

    • volumesPerInstance (pulumi.Input[float]) - The number of EBS volumes with this configuration to attach to each EC2 instance in the instance group (default is 1)

  • id (pulumi.Input[str]) - The ID of the EMR Cluster

  • instance_count (pulumi.Input[float]) - Target number of instances for the instance group. Must be 1 or 3. Defaults to 1. Launching with multiple master nodes is only supported in EMR version 5.23.0+, and requires this resource’s core_instance_group to be configured. Public (Internet accessible) instances must be created in VPC subnets that have map public IP on launch enabled. Termination protection is automatically enabled when launched with multiple master nodes and this provider must have the termination_protection = false configuration applied before destroying this resource.

  • instanceRole (pulumi.Input[str]) - The role of the instance group in the cluster. Valid values are: MASTER, CORE, and TASK.

  • instance_type (pulumi.Input[str]) - EC2 instance type for all instances in the instance group.

  • name (pulumi.Input[str]) - The name of the step.

The kerberos_attributes object supports the following:

  • adDomainJoinPassword (pulumi.Input[str]) - The Active Directory password for ad_domain_join_user. This provider cannot perform drift detection of this configuration.

  • adDomainJoinUser (pulumi.Input[str]) - Required only when establishing a cross-realm trust with an Active Directory domain. A user with sufficient privileges to join resources to the domain. This provider cannot perform drift detection of this configuration.

  • crossRealmTrustPrincipalPassword (pulumi.Input[str]) - Required only when establishing a cross-realm trust with a KDC in a different realm. The cross-realm principal password, which must be identical across realms. This provider cannot perform drift detection of this configuration.

  • kdcAdminPassword (pulumi.Input[str]) - The password used within the cluster for the kadmin service on the cluster-dedicated KDC, which maintains Kerberos principals, password policies, and keytabs for the cluster. This provider cannot perform drift detection of this configuration.

  • realm (pulumi.Input[str]) - The name of the Kerberos realm to which all nodes in a cluster belong. For example, EC2.INTERNAL

The master_instance_group object supports the following:

  • bid_price (pulumi.Input[str]) - Bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances.

  • ebs_configs (pulumi.Input[list]) - Configuration block(s) for EBS volumes attached to each instance in the instance group. Detailed below.

    • iops (pulumi.Input[float]) - The number of I/O operations per second (IOPS) that the volume supports

    • size (pulumi.Input[float]) - The volume size, in gibibytes (GiB).

    • type (pulumi.Input[str]) - The volume type. Valid options are gp2, io1, standard and st1. See EBS Volume Types.

    • volumesPerInstance (pulumi.Input[float]) - The number of EBS volumes with this configuration to attach to each EC2 instance in the instance group (default is 1)

  • id (pulumi.Input[str]) - The ID of the EMR Cluster

  • instance_count (pulumi.Input[float]) - Target number of instances for the instance group. Must be 1 or 3. Defaults to 1. Launching with multiple master nodes is only supported in EMR version 5.23.0+, and requires this resource’s core_instance_group to be configured. Public (Internet accessible) instances must be created in VPC subnets that have map public IP on launch enabled. Termination protection is automatically enabled when launched with multiple master nodes and this provider must have the termination_protection = false configuration applied before destroying this resource.

  • instance_type (pulumi.Input[str]) - EC2 instance type for all instances in the instance group.

  • name (pulumi.Input[str]) - The name of the step.

The steps object supports the following:

  • actionOnFailure (pulumi.Input[str]) - The action to take if the step fails. Valid values: TERMINATE_JOB_FLOW, TERMINATE_CLUSTER, CANCEL_AND_WAIT, and CONTINUE

  • hadoopJarStep (pulumi.Input[dict]) - The JAR file used for the step. Defined below.

    • args (pulumi.Input[list]) - List of command line arguments passed to the JAR file’s main function when executed.

    • jar (pulumi.Input[str]) - Path to a JAR file run during the step.

    • mainClass (pulumi.Input[str]) - Name of the main class in the specified Java file. If not specified, the JAR file should specify a Main-Class in its manifest file.

    • properties (pulumi.Input[dict]) - Key-Value map of Java properties that are set when the step runs. You can use these properties to pass key value pairs to your main function.

  • name (pulumi.Input[str]) - The name of the step.

additional_info: pulumi.Output[str] = None

A JSON string for selecting additional features such as adding proxy information. Note: Currently there is no API to retrieve the value of this argument after EMR cluster creation from provider, therefore this provider cannot detect drift from the actual EMR cluster if its value is changed outside this provider.

applications: pulumi.Output[list] = None

A list of applications for the cluster. Valid values are: Flink, Hadoop, Hive, Mahout, Pig, Spark, and JupyterHub (as of EMR 5.14.0). Case insensitive

autoscaling_role: pulumi.Output[str] = None

An IAM role for automatic scaling policies. The IAM role provides permissions that the automatic scaling feature requires to launch and terminate EC2 instances in an instance group.

bootstrap_actions: pulumi.Output[list] = None

Ordered list of bootstrap actions that will be run before Hadoop is started on the cluster nodes. Defined below.

  • args (list) - List of command line arguments passed to the JAR file’s main function when executed.

  • name (str) - The name of the step.

  • path (str) - Location of the script to run during a bootstrap action. Can be either a location in Amazon S3 or on a local file system

configurations: pulumi.Output[str] = None

List of configurations supplied for the EMR cluster you are creating

configurations_json: pulumi.Output[str] = None

A JSON string for supplying list of configurations for the EMR cluster.

core_instance_count: pulumi.Output[float] = None

Use the core_instance_group configuration block instance_count argument instead. Number of Amazon EC2 instances used to execute the job flow. EMR will use one node as the cluster’s master node and use the remainder of the nodes (core_instance_count-1) as core nodes. Cannot be specified if core_instance_group or instance_group configuration blocks are set. Default 1

core_instance_group: pulumi.Output[dict] = None

Configuration block to use an Instance Group for the core node type. Cannot be specified if core_instance_count argument, core_instance_type argument, or instance_group configuration blocks are set. Detailed below.

  • autoscaling_policy (str) - The autoscaling policy document. This is a JSON formatted string. See EMR Auto Scaling

  • bid_price (str) - Bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances.

  • ebs_configs (list) - Configuration block(s) for EBS volumes attached to each instance in the instance group. Detailed below.

    • iops (float) - The number of I/O operations per second (IOPS) that the volume supports

    • size (float) - The volume size, in gibibytes (GiB).

    • type (str) - The volume type. Valid options are gp2, io1, standard and st1. See EBS Volume Types.

    • volumesPerInstance (float) - The number of EBS volumes with this configuration to attach to each EC2 instance in the instance group (default is 1)

  • id (str) - The ID of the EMR Cluster

  • instance_count (float) - Target number of instances for the instance group. Must be 1 or 3. Defaults to 1. Launching with multiple master nodes is only supported in EMR version 5.23.0+, and requires this resource’s core_instance_group to be configured. Public (Internet accessible) instances must be created in VPC subnets that have map public IP on launch enabled. Termination protection is automatically enabled when launched with multiple master nodes and this provider must have the termination_protection = false configuration applied before destroying this resource.

  • instance_type (str) - EC2 instance type for all instances in the instance group.

  • name (str) - The name of the step.

core_instance_type: pulumi.Output[str] = None

Use the core_instance_group configuration block instance_type argument instead. The EC2 instance type of the slave nodes. Cannot be specified if core_instance_group or instance_group configuration blocks are set.

custom_ami_id: pulumi.Output[str] = None

A custom Amazon Linux AMI for the cluster (instead of an EMR-owned AMI). Available in Amazon EMR version 5.7.0 and later.

ebs_root_volume_size: pulumi.Output[float] = None

Size in GiB of the EBS root device volume of the Linux AMI that is used for each EC2 instance. Available in Amazon EMR version 4.x and later.

ec2_attributes: pulumi.Output[dict] = None

Attributes for the EC2 instances running the job flow. Defined below

  • additionalMasterSecurityGroups (str) - String containing a comma separated list of additional Amazon EC2 security group IDs for the master node

  • additionalSlaveSecurityGroups (str) - String containing a comma separated list of additional Amazon EC2 security group IDs for the slave nodes as a comma separated string

  • emrManagedMasterSecurityGroup (str) - Identifier of the Amazon EC2 EMR-Managed security group for the master node

  • emrManagedSlaveSecurityGroup (str) - Identifier of the Amazon EC2 EMR-Managed security group for the slave nodes

  • instanceProfile (str) - Instance Profile for EC2 instances of the cluster assume this role

  • key_name (str) - Amazon EC2 key pair that can be used to ssh to the master node as the user called hadoop

  • serviceAccessSecurityGroup (str) - Identifier of the Amazon EC2 service-access security group - required when the cluster runs on a private subnet

  • subnet_id (str) - VPC subnet id where you want the job flow to launch. Cannot specify the cc1.4xlarge instance type for nodes of a job flow launched in a Amazon VPC

instance_groups: pulumi.Output[list] = None

Use the master_instance_group configuration block, core_instance_group configuration block and emr.InstanceGroup resource(s) instead. A list of instance_group objects for each instance group in the cluster. Exactly one of master_instance_type and instance_group must be specified. If instance_group is set, then it must contain a configuration block for at least the MASTER instance group type (as well as any additional instance groups). Cannot be specified if master_instance_group or core_instance_group configuration blocks are set. Defined below

  • autoscaling_policy (str) - The autoscaling policy document. This is a JSON formatted string. See EMR Auto Scaling

  • bid_price (str) - Bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances.

  • ebs_configs (list) - Configuration block(s) for EBS volumes attached to each instance in the instance group. Detailed below.

    • iops (float) - The number of I/O operations per second (IOPS) that the volume supports

    • size (float) - The volume size, in gibibytes (GiB).

    • type (str) - The volume type. Valid options are gp2, io1, standard and st1. See EBS Volume Types.

    • volumesPerInstance (float) - The number of EBS volumes with this configuration to attach to each EC2 instance in the instance group (default is 1)

  • id (str) - The ID of the EMR Cluster

  • instance_count (float) - Target number of instances for the instance group. Must be 1 or 3. Defaults to 1. Launching with multiple master nodes is only supported in EMR version 5.23.0+, and requires this resource’s core_instance_group to be configured. Public (Internet accessible) instances must be created in VPC subnets that have map public IP on launch enabled. Termination protection is automatically enabled when launched with multiple master nodes and this provider must have the termination_protection = false configuration applied before destroying this resource.

  • instanceRole (str) - The role of the instance group in the cluster. Valid values are: MASTER, CORE, and TASK.

  • instance_type (str) - EC2 instance type for all instances in the instance group.

  • name (str) - The name of the step.

keep_job_flow_alive_when_no_steps: pulumi.Output[bool] = None

Switch on/off run cluster with no steps or when all steps are complete (default is on)

kerberos_attributes: pulumi.Output[dict] = None

Kerberos configuration for the cluster. Defined below

  • adDomainJoinPassword (str) - The Active Directory password for ad_domain_join_user. This provider cannot perform drift detection of this configuration.

  • adDomainJoinUser (str) - Required only when establishing a cross-realm trust with an Active Directory domain. A user with sufficient privileges to join resources to the domain. This provider cannot perform drift detection of this configuration.

  • crossRealmTrustPrincipalPassword (str) - Required only when establishing a cross-realm trust with a KDC in a different realm. The cross-realm principal password, which must be identical across realms. This provider cannot perform drift detection of this configuration.

  • kdcAdminPassword (str) - The password used within the cluster for the kadmin service on the cluster-dedicated KDC, which maintains Kerberos principals, password policies, and keytabs for the cluster. This provider cannot perform drift detection of this configuration.

  • realm (str) - The name of the Kerberos realm to which all nodes in a cluster belong. For example, EC2.INTERNAL

log_uri: pulumi.Output[str] = None

S3 bucket to write the log files of the job flow. If a value is not provided, logs are not created

master_instance_group: pulumi.Output[dict] = None

Configuration block to use an Instance Group for the master node type. Cannot be specified if master_instance_type argument or instance_group configuration blocks are set. Detailed below.

  • bid_price (str) - Bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances.

  • ebs_configs (list) - Configuration block(s) for EBS volumes attached to each instance in the instance group. Detailed below.

    • iops (float) - The number of I/O operations per second (IOPS) that the volume supports

    • size (float) - The volume size, in gibibytes (GiB).

    • type (str) - The volume type. Valid options are gp2, io1, standard and st1. See EBS Volume Types.

    • volumesPerInstance (float) - The number of EBS volumes with this configuration to attach to each EC2 instance in the instance group (default is 1)

  • id (str) - The ID of the EMR Cluster

  • instance_count (float) - Target number of instances for the instance group. Must be 1 or 3. Defaults to 1. Launching with multiple master nodes is only supported in EMR version 5.23.0+, and requires this resource’s core_instance_group to be configured. Public (Internet accessible) instances must be created in VPC subnets that have map public IP on launch enabled. Termination protection is automatically enabled when launched with multiple master nodes and this provider must have the termination_protection = false configuration applied before destroying this resource.

  • instance_type (str) - EC2 instance type for all instances in the instance group.

  • name (str) - The name of the step.

master_instance_type: pulumi.Output[str] = None

Use the master_instance_group configuration block instance_type argument instead. The EC2 instance type of the master node. Cannot be specified if master_instance_group or instance_group configuration blocks are set.

master_public_dns: pulumi.Output[str] = None

The public DNS name of the master EC2 instance.

  • core_instance_group.0.id - Core node type Instance Group ID, if using Instance Group for this node type.

name: pulumi.Output[str] = None

The name of the step.

release_label: pulumi.Output[str] = None

The release label for the Amazon EMR release

scale_down_behavior: pulumi.Output[str] = None

The way that individual Amazon EC2 instances terminate when an automatic scale-in activity occurs or an instance group is resized.

security_configuration: pulumi.Output[str] = None

The security configuration name to attach to the EMR cluster. Only valid for EMR clusters with release_label 4.8.0 or greater

service_role: pulumi.Output[str] = None

IAM role that will be assumed by the Amazon EMR service to access AWS resources

step_concurrency_level: pulumi.Output[float] = None

The number of steps that can be executed concurrently. You can specify a maximum of 256 steps. Only valid for EMR clusters with release_label 5.28.0 or greater. (default is 1)

steps: pulumi.Output[list] = None

List of steps to run when creating the cluster. Defined below. It is highly recommended to utilize ``ignoreChanges` <https://www.pulumi.com/docs/intro/concepts/programming-model/#ignorechanges>`_ if other steps are being managed outside of this provider.

  • actionOnFailure (str) - The action to take if the step fails. Valid values: TERMINATE_JOB_FLOW, TERMINATE_CLUSTER, CANCEL_AND_WAIT, and CONTINUE

  • hadoopJarStep (dict) - The JAR file used for the step. Defined below.

    • args (list) - List of command line arguments passed to the JAR file’s main function when executed.

    • jar (str) - Path to a JAR file run during the step.

    • mainClass (str) - Name of the main class in the specified Java file. If not specified, the JAR file should specify a Main-Class in its manifest file.

    • properties (dict) - Key-Value map of Java properties that are set when the step runs. You can use these properties to pass key value pairs to your main function.

  • name (str) - The name of the step.

tags: pulumi.Output[dict] = None

list of tags to apply to the EMR Cluster

termination_protection: pulumi.Output[bool] = None

Switch on/off termination protection (default is false, except when using multiple master nodes). Before attempting to destroy the resource when termination protection is enabled, this configuration must be applied with its value set to false.

visible_to_all_users: pulumi.Output[bool] = None

Whether the job flow is visible to all IAM users of the AWS account associated with the job flow. Default true

static get(resource_name, id, opts=None, additional_info=None, applications=None, arn=None, autoscaling_role=None, bootstrap_actions=None, cluster_state=None, configurations=None, configurations_json=None, core_instance_count=None, core_instance_group=None, core_instance_type=None, custom_ami_id=None, ebs_root_volume_size=None, ec2_attributes=None, instance_groups=None, keep_job_flow_alive_when_no_steps=None, kerberos_attributes=None, log_uri=None, master_instance_group=None, master_instance_type=None, master_public_dns=None, name=None, release_label=None, scale_down_behavior=None, security_configuration=None, service_role=None, step_concurrency_level=None, steps=None, tags=None, termination_protection=None, visible_to_all_users=None)

Get an existing Cluster resource’s state with the given name, id, and optional extra properties used to qualify the lookup.

Parameters
  • resource_name (str) – The unique name of the resulting resource.

  • id (str) – The unique provider ID of the resource to lookup.

  • opts (pulumi.ResourceOptions) – Options for the resource.

  • additional_info (pulumi.Input[str]) – A JSON string for selecting additional features such as adding proxy information. Note: Currently there is no API to retrieve the value of this argument after EMR cluster creation from provider, therefore this provider cannot detect drift from the actual EMR cluster if its value is changed outside this provider.

  • applications (pulumi.Input[list]) – A list of applications for the cluster. Valid values are: Flink, Hadoop, Hive, Mahout, Pig, Spark, and JupyterHub (as of EMR 5.14.0). Case insensitive

  • autoscaling_role (pulumi.Input[str]) – An IAM role for automatic scaling policies. The IAM role provides permissions that the automatic scaling feature requires to launch and terminate EC2 instances in an instance group.

  • bootstrap_actions (pulumi.Input[list]) – Ordered list of bootstrap actions that will be run before Hadoop is started on the cluster nodes. Defined below.

  • configurations (pulumi.Input[str]) – List of configurations supplied for the EMR cluster you are creating

  • configurations_json (pulumi.Input[str]) – A JSON string for supplying list of configurations for the EMR cluster.

  • core_instance_count (pulumi.Input[float]) – Use the core_instance_group configuration block instance_count argument instead. Number of Amazon EC2 instances used to execute the job flow. EMR will use one node as the cluster’s master node and use the remainder of the nodes (core_instance_count-1) as core nodes. Cannot be specified if core_instance_group or instance_group configuration blocks are set. Default 1

  • core_instance_group (pulumi.Input[dict]) –

    Configuration block to use an Instance Group for the core node type. Cannot be specified if core_instance_count argument, core_instance_type argument, or instance_group configuration blocks are set. Detailed below.

  • core_instance_type (pulumi.Input[str]) – Use the core_instance_group configuration block instance_type argument instead. The EC2 instance type of the slave nodes. Cannot be specified if core_instance_group or instance_group configuration blocks are set.

  • custom_ami_id (pulumi.Input[str]) – A custom Amazon Linux AMI for the cluster (instead of an EMR-owned AMI). Available in Amazon EMR version 5.7.0 and later.

  • ebs_root_volume_size (pulumi.Input[float]) – Size in GiB of the EBS root device volume of the Linux AMI that is used for each EC2 instance. Available in Amazon EMR version 4.x and later.

  • ec2_attributes (pulumi.Input[dict]) – Attributes for the EC2 instances running the job flow. Defined below

  • instance_groups (pulumi.Input[list]) – Use the master_instance_group configuration block, core_instance_group configuration block and emr.InstanceGroup resource(s) instead. A list of instance_group objects for each instance group in the cluster. Exactly one of master_instance_type and instance_group must be specified. If instance_group is set, then it must contain a configuration block for at least the MASTER instance group type (as well as any additional instance groups). Cannot be specified if master_instance_group or core_instance_group configuration blocks are set. Defined below

  • keep_job_flow_alive_when_no_steps (pulumi.Input[bool]) – Switch on/off run cluster with no steps or when all steps are complete (default is on)

  • kerberos_attributes (pulumi.Input[dict]) – Kerberos configuration for the cluster. Defined below

  • log_uri (pulumi.Input[str]) – S3 bucket to write the log files of the job flow. If a value is not provided, logs are not created

  • master_instance_group (pulumi.Input[dict]) –

    Configuration block to use an Instance Group for the master node type. Cannot be specified if master_instance_type argument or instance_group configuration blocks are set. Detailed below.

  • master_instance_type (pulumi.Input[str]) – Use the master_instance_group configuration block instance_type argument instead. The EC2 instance type of the master node. Cannot be specified if master_instance_group or instance_group configuration blocks are set.

  • master_public_dns (pulumi.Input[str]) – The public DNS name of the master EC2 instance.

* `core_instance_group.0.id` - Core node type Instance Group ID, if using Instance Group for this node type.
Parameters
  • name (pulumi.Input[str]) – The name of the step.

  • release_label (pulumi.Input[str]) – The release label for the Amazon EMR release

  • scale_down_behavior (pulumi.Input[str]) – The way that individual Amazon EC2 instances terminate when an automatic scale-in activity occurs or an instance group is resized.

  • security_configuration (pulumi.Input[str]) – The security configuration name to attach to the EMR cluster. Only valid for EMR clusters with release_label 4.8.0 or greater

  • service_role (pulumi.Input[str]) – IAM role that will be assumed by the Amazon EMR service to access AWS resources

  • step_concurrency_level (pulumi.Input[float]) – The number of steps that can be executed concurrently. You can specify a maximum of 256 steps. Only valid for EMR clusters with release_label 5.28.0 or greater. (default is 1)

  • steps (pulumi.Input[list]) – List of steps to run when creating the cluster. Defined below. It is highly recommended to utilize ``ignoreChanges` <https://www.pulumi.com/docs/intro/concepts/programming-model/#ignorechanges>`_ if other steps are being managed outside of this provider.

  • tags (pulumi.Input[dict]) – list of tags to apply to the EMR Cluster

  • termination_protection (pulumi.Input[bool]) – Switch on/off termination protection (default is false, except when using multiple master nodes). Before attempting to destroy the resource when termination protection is enabled, this configuration must be applied with its value set to false.

  • visible_to_all_users (pulumi.Input[bool]) – Whether the job flow is visible to all IAM users of the AWS account associated with the job flow. Default true

The bootstrap_actions object supports the following:

  • args (pulumi.Input[list]) - List of command line arguments passed to the JAR file’s main function when executed.

  • name (pulumi.Input[str]) - The name of the step.

  • path (pulumi.Input[str]) - Location of the script to run during a bootstrap action. Can be either a location in Amazon S3 or on a local file system

The core_instance_group object supports the following:

  • autoscaling_policy (pulumi.Input[str]) - The autoscaling policy document. This is a JSON formatted string. See EMR Auto Scaling

  • bid_price (pulumi.Input[str]) - Bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances.

  • ebs_configs (pulumi.Input[list]) - Configuration block(s) for EBS volumes attached to each instance in the instance group. Detailed below.

    • iops (pulumi.Input[float]) - The number of I/O operations per second (IOPS) that the volume supports

    • size (pulumi.Input[float]) - The volume size, in gibibytes (GiB).

    • type (pulumi.Input[str]) - The volume type. Valid options are gp2, io1, standard and st1. See EBS Volume Types.

    • volumesPerInstance (pulumi.Input[float]) - The number of EBS volumes with this configuration to attach to each EC2 instance in the instance group (default is 1)

  • id (pulumi.Input[str]) - The ID of the EMR Cluster

  • instance_count (pulumi.Input[float]) - Target number of instances for the instance group. Must be 1 or 3. Defaults to 1. Launching with multiple master nodes is only supported in EMR version 5.23.0+, and requires this resource’s core_instance_group to be configured. Public (Internet accessible) instances must be created in VPC subnets that have map public IP on launch enabled. Termination protection is automatically enabled when launched with multiple master nodes and this provider must have the termination_protection = false configuration applied before destroying this resource.

  • instance_type (pulumi.Input[str]) - EC2 instance type for all instances in the instance group.

  • name (pulumi.Input[str]) - The name of the step.

The ec2_attributes object supports the following:

  • additionalMasterSecurityGroups (pulumi.Input[str]) - String containing a comma separated list of additional Amazon EC2 security group IDs for the master node

  • additionalSlaveSecurityGroups (pulumi.Input[str]) - String containing a comma separated list of additional Amazon EC2 security group IDs for the slave nodes as a comma separated string

  • emrManagedMasterSecurityGroup (pulumi.Input[str]) - Identifier of the Amazon EC2 EMR-Managed security group for the master node

  • emrManagedSlaveSecurityGroup (pulumi.Input[str]) - Identifier of the Amazon EC2 EMR-Managed security group for the slave nodes

  • instanceProfile (pulumi.Input[str]) - Instance Profile for EC2 instances of the cluster assume this role

  • key_name (pulumi.Input[str]) - Amazon EC2 key pair that can be used to ssh to the master node as the user called hadoop

  • serviceAccessSecurityGroup (pulumi.Input[str]) - Identifier of the Amazon EC2 service-access security group - required when the cluster runs on a private subnet

  • subnet_id (pulumi.Input[str]) - VPC subnet id where you want the job flow to launch. Cannot specify the cc1.4xlarge instance type for nodes of a job flow launched in a Amazon VPC

The instance_groups object supports the following:

  • autoscaling_policy (pulumi.Input[str]) - The autoscaling policy document. This is a JSON formatted string. See EMR Auto Scaling

  • bid_price (pulumi.Input[str]) - Bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances.

  • ebs_configs (pulumi.Input[list]) - Configuration block(s) for EBS volumes attached to each instance in the instance group. Detailed below.

    • iops (pulumi.Input[float]) - The number of I/O operations per second (IOPS) that the volume supports

    • size (pulumi.Input[float]) - The volume size, in gibibytes (GiB).

    • type (pulumi.Input[str]) - The volume type. Valid options are gp2, io1, standard and st1. See EBS Volume Types.

    • volumesPerInstance (pulumi.Input[float]) - The number of EBS volumes with this configuration to attach to each EC2 instance in the instance group (default is 1)

  • id (pulumi.Input[str]) - The ID of the EMR Cluster

  • instance_count (pulumi.Input[float]) - Target number of instances for the instance group. Must be 1 or 3. Defaults to 1. Launching with multiple master nodes is only supported in EMR version 5.23.0+, and requires this resource’s core_instance_group to be configured. Public (Internet accessible) instances must be created in VPC subnets that have map public IP on launch enabled. Termination protection is automatically enabled when launched with multiple master nodes and this provider must have the termination_protection = false configuration applied before destroying this resource.

  • instanceRole (pulumi.Input[str]) - The role of the instance group in the cluster. Valid values are: MASTER, CORE, and TASK.

  • instance_type (pulumi.Input[str]) - EC2 instance type for all instances in the instance group.

  • name (pulumi.Input[str]) - The name of the step.

The kerberos_attributes object supports the following:

  • adDomainJoinPassword (pulumi.Input[str]) - The Active Directory password for ad_domain_join_user. This provider cannot perform drift detection of this configuration.

  • adDomainJoinUser (pulumi.Input[str]) - Required only when establishing a cross-realm trust with an Active Directory domain. A user with sufficient privileges to join resources to the domain. This provider cannot perform drift detection of this configuration.

  • crossRealmTrustPrincipalPassword (pulumi.Input[str]) - Required only when establishing a cross-realm trust with a KDC in a different realm. The cross-realm principal password, which must be identical across realms. This provider cannot perform drift detection of this configuration.

  • kdcAdminPassword (pulumi.Input[str]) - The password used within the cluster for the kadmin service on the cluster-dedicated KDC, which maintains Kerberos principals, password policies, and keytabs for the cluster. This provider cannot perform drift detection of this configuration.

  • realm (pulumi.Input[str]) - The name of the Kerberos realm to which all nodes in a cluster belong. For example, EC2.INTERNAL

The master_instance_group object supports the following:

  • bid_price (pulumi.Input[str]) - Bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances.

  • ebs_configs (pulumi.Input[list]) - Configuration block(s) for EBS volumes attached to each instance in the instance group. Detailed below.

    • iops (pulumi.Input[float]) - The number of I/O operations per second (IOPS) that the volume supports

    • size (pulumi.Input[float]) - The volume size, in gibibytes (GiB).

    • type (pulumi.Input[str]) - The volume type. Valid options are gp2, io1, standard and st1. See EBS Volume Types.

    • volumesPerInstance (pulumi.Input[float]) - The number of EBS volumes with this configuration to attach to each EC2 instance in the instance group (default is 1)

  • id (pulumi.Input[str]) - The ID of the EMR Cluster

  • instance_count (pulumi.Input[float]) - Target number of instances for the instance group. Must be 1 or 3. Defaults to 1. Launching with multiple master nodes is only supported in EMR version 5.23.0+, and requires this resource’s core_instance_group to be configured. Public (Internet accessible) instances must be created in VPC subnets that have map public IP on launch enabled. Termination protection is automatically enabled when launched with multiple master nodes and this provider must have the termination_protection = false configuration applied before destroying this resource.

  • instance_type (pulumi.Input[str]) - EC2 instance type for all instances in the instance group.

  • name (pulumi.Input[str]) - The name of the step.

The steps object supports the following:

  • actionOnFailure (pulumi.Input[str]) - The action to take if the step fails. Valid values: TERMINATE_JOB_FLOW, TERMINATE_CLUSTER, CANCEL_AND_WAIT, and CONTINUE

  • hadoopJarStep (pulumi.Input[dict]) - The JAR file used for the step. Defined below.

    • args (pulumi.Input[list]) - List of command line arguments passed to the JAR file’s main function when executed.

    • jar (pulumi.Input[str]) - Path to a JAR file run during the step.

    • mainClass (pulumi.Input[str]) - Name of the main class in the specified Java file. If not specified, the JAR file should specify a Main-Class in its manifest file.

    • properties (pulumi.Input[dict]) - Key-Value map of Java properties that are set when the step runs. You can use these properties to pass key value pairs to your main function.

  • name (pulumi.Input[str]) - The name of the step.

translate_output_property(prop)

Provides subclasses of Resource an opportunity to translate names of output properties into a format of their choosing before writing those properties to the resource object.

Parameters

prop (str) – A property name.

Returns

A potentially transformed property name.

Return type

str

translate_input_property(prop)

Provides subclasses of Resource an opportunity to translate names of input properties into a format of their choosing before sending those properties to the Pulumi engine.

Parameters

prop (str) – A property name.

Returns

A potentially transformed property name.

Return type

str

class pulumi_aws.emr.InstanceGroup(resource_name, opts=None, autoscaling_policy=None, bid_price=None, cluster_id=None, configurations_json=None, ebs_configs=None, ebs_optimized=None, instance_count=None, instance_type=None, name=None, __props__=None, __name__=None, __opts__=None)

Provides an Elastic MapReduce Cluster Instance Group configuration. See Amazon Elastic MapReduce Documentation for more information.

NOTE: At this time, Instance Groups cannot be destroyed through the API nor web interface. Instance Groups are destroyed when the EMR Cluster is destroyed. this provider will resize any Instance Group to zero when destroying the resource.

import pulumi
import pulumi_aws as aws

task = aws.emr.InstanceGroup("task",
    cluster_id=aws_emr_cluster["tf-test-cluster"]["id"],
    instance_count=1,
    instance_type="m5.xlarge")
Parameters
  • resource_name (str) – The name of the resource.

  • opts (pulumi.ResourceOptions) – Options for the resource.

  • autoscaling_policy (pulumi.Input[str]) –

    The autoscaling policy document. This is a JSON formatted string. See EMR Auto Scaling

  • bid_price (pulumi.Input[str]) – If set, the bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances.

  • cluster_id (pulumi.Input[str]) – ID of the EMR Cluster to attach to. Changing this forces a new resource to be created.

  • configurations_json (pulumi.Input[str]) – A JSON string for supplying list of configurations specific to the EMR instance group. Note that this can only be changed when using EMR release 5.21 or later.

  • ebs_configs (pulumi.Input[list]) – One or more ebs_config blocks as defined below. Changing this forces a new resource to be created.

  • ebs_optimized (pulumi.Input[bool]) – Indicates whether an Amazon EBS volume is EBS-optimized. Changing this forces a new resource to be created.

  • instance_count (pulumi.Input[float]) – target number of instances for the instance group. defaults to 0.

  • instance_type (pulumi.Input[str]) – The EC2 instance type for all instances in the instance group. Changing this forces a new resource to be created.

  • name (pulumi.Input[str]) – Human friendly name given to the instance group. Changing this forces a new resource to be created.

The ebs_configs object supports the following:

  • iops (pulumi.Input[float]) - The number of I/O operations per second (IOPS) that the volume supports.

  • size (pulumi.Input[float]) - The volume size, in gibibytes (GiB). This can be a number from 1 - 1024. If the volume type is EBS-optimized, the minimum value is 10.

  • type (pulumi.Input[str]) - The volume type. Valid options are ‘gp2’, ‘io1’ and ‘standard’.

  • volumesPerInstance (pulumi.Input[float]) - The number of EBS Volumes to attach per instance.

autoscaling_policy: pulumi.Output[str] = None

The autoscaling policy document. This is a JSON formatted string. See EMR Auto Scaling

bid_price: pulumi.Output[str] = None

If set, the bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances.

cluster_id: pulumi.Output[str] = None

ID of the EMR Cluster to attach to. Changing this forces a new resource to be created.

configurations_json: pulumi.Output[str] = None

A JSON string for supplying list of configurations specific to the EMR instance group. Note that this can only be changed when using EMR release 5.21 or later.

ebs_configs: pulumi.Output[list] = None

One or more ebs_config blocks as defined below. Changing this forces a new resource to be created.

  • iops (float) - The number of I/O operations per second (IOPS) that the volume supports.

  • size (float) - The volume size, in gibibytes (GiB). This can be a number from 1 - 1024. If the volume type is EBS-optimized, the minimum value is 10.

  • type (str) - The volume type. Valid options are ‘gp2’, ‘io1’ and ‘standard’.

  • volumesPerInstance (float) - The number of EBS Volumes to attach per instance.

ebs_optimized: pulumi.Output[bool] = None

Indicates whether an Amazon EBS volume is EBS-optimized. Changing this forces a new resource to be created.

instance_count: pulumi.Output[float] = None

target number of instances for the instance group. defaults to 0.

instance_type: pulumi.Output[str] = None

The EC2 instance type for all instances in the instance group. Changing this forces a new resource to be created.

name: pulumi.Output[str] = None

Human friendly name given to the instance group. Changing this forces a new resource to be created.

static get(resource_name, id, opts=None, autoscaling_policy=None, bid_price=None, cluster_id=None, configurations_json=None, ebs_configs=None, ebs_optimized=None, instance_count=None, instance_type=None, name=None, running_instance_count=None, status=None)

Get an existing InstanceGroup resource’s state with the given name, id, and optional extra properties used to qualify the lookup.

Parameters
  • resource_name (str) – The unique name of the resulting resource.

  • id (str) – The unique provider ID of the resource to lookup.

  • opts (pulumi.ResourceOptions) – Options for the resource.

  • autoscaling_policy (pulumi.Input[str]) –

    The autoscaling policy document. This is a JSON formatted string. See EMR Auto Scaling

  • bid_price (pulumi.Input[str]) – If set, the bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances.

  • cluster_id (pulumi.Input[str]) – ID of the EMR Cluster to attach to. Changing this forces a new resource to be created.

  • configurations_json (pulumi.Input[str]) – A JSON string for supplying list of configurations specific to the EMR instance group. Note that this can only be changed when using EMR release 5.21 or later.

  • ebs_configs (pulumi.Input[list]) – One or more ebs_config blocks as defined below. Changing this forces a new resource to be created.

  • ebs_optimized (pulumi.Input[bool]) – Indicates whether an Amazon EBS volume is EBS-optimized. Changing this forces a new resource to be created.

  • instance_count (pulumi.Input[float]) – target number of instances for the instance group. defaults to 0.

  • instance_type (pulumi.Input[str]) – The EC2 instance type for all instances in the instance group. Changing this forces a new resource to be created.

  • name (pulumi.Input[str]) – Human friendly name given to the instance group. Changing this forces a new resource to be created.

The ebs_configs object supports the following:

  • iops (pulumi.Input[float]) - The number of I/O operations per second (IOPS) that the volume supports.

  • size (pulumi.Input[float]) - The volume size, in gibibytes (GiB). This can be a number from 1 - 1024. If the volume type is EBS-optimized, the minimum value is 10.

  • type (pulumi.Input[str]) - The volume type. Valid options are ‘gp2’, ‘io1’ and ‘standard’.

  • volumesPerInstance (pulumi.Input[float]) - The number of EBS Volumes to attach per instance.

translate_output_property(prop)

Provides subclasses of Resource an opportunity to translate names of output properties into a format of their choosing before writing those properties to the resource object.

Parameters

prop (str) – A property name.

Returns

A potentially transformed property name.

Return type

str

translate_input_property(prop)

Provides subclasses of Resource an opportunity to translate names of input properties into a format of their choosing before sending those properties to the Pulumi engine.

Parameters

prop (str) – A property name.

Returns

A potentially transformed property name.

Return type

str

class pulumi_aws.emr.SecurityConfiguration(resource_name, opts=None, configuration=None, name=None, name_prefix=None, __props__=None, __name__=None, __opts__=None)

Provides a resource to manage AWS EMR Security Configurations

import pulumi
import pulumi_aws as aws

foo = aws.emr.SecurityConfiguration("foo", configuration="""{
  "EncryptionConfiguration": {
    "AtRestEncryptionConfiguration": {
      "S3EncryptionConfiguration": {
        "EncryptionMode": "SSE-S3"
      },
      "LocalDiskEncryptionConfiguration": {
        "EncryptionKeyProviderType": "AwsKms",
        "AwsKmsKey": "arn:aws:kms:us-west-2:187416307283:alias/tf_emr_test_key"
      }
    },
    "EnableInTransitEncryption": false,
    "EnableAtRestEncryption": true
  }
}

""")
Parameters
  • resource_name (str) – The name of the resource.

  • opts (pulumi.ResourceOptions) – Options for the resource.

  • configuration (pulumi.Input[str]) – A JSON formatted Security Configuration

  • name (pulumi.Input[str]) – The name of the EMR Security Configuration. By default generated by this provider.

  • name_prefix (pulumi.Input[str]) – Creates a unique name beginning with the specified prefix. Conflicts with name.

configuration: pulumi.Output[str] = None

A JSON formatted Security Configuration

creation_date: pulumi.Output[str] = None

Date the Security Configuration was created

name: pulumi.Output[str] = None

The name of the EMR Security Configuration. By default generated by this provider.

name_prefix: pulumi.Output[str] = None

Creates a unique name beginning with the specified prefix. Conflicts with name.

static get(resource_name, id, opts=None, configuration=None, creation_date=None, name=None, name_prefix=None)

Get an existing SecurityConfiguration resource’s state with the given name, id, and optional extra properties used to qualify the lookup.

Parameters
  • resource_name (str) – The unique name of the resulting resource.

  • id (str) – The unique provider ID of the resource to lookup.

  • opts (pulumi.ResourceOptions) – Options for the resource.

  • configuration (pulumi.Input[str]) – A JSON formatted Security Configuration

  • creation_date (pulumi.Input[str]) – Date the Security Configuration was created

  • name (pulumi.Input[str]) – The name of the EMR Security Configuration. By default generated by this provider.

  • name_prefix (pulumi.Input[str]) – Creates a unique name beginning with the specified prefix. Conflicts with name.

translate_output_property(prop)

Provides subclasses of Resource an opportunity to translate names of output properties into a format of their choosing before writing those properties to the resource object.

Parameters

prop (str) – A property name.

Returns

A potentially transformed property name.

Return type

str

translate_input_property(prop)

Provides subclasses of Resource an opportunity to translate names of input properties into a format of their choosing before sending those properties to the Pulumi engine.

Parameters

prop (str) – A property name.

Returns

A potentially transformed property name.

Return type

str