1. Simplified AWS Resource Access for Databricks with Instance Profiles


    When working with AWS and Databricks, an instance profile is an AWS resource that contains the security credentials that Databricks uses to access AWS services. This Instance Profile acts as a container for AWS Identity and Access Management (IAM) roles and is used to delegate permissions to Databricks clusters without embedding long-term AWS keys.

    To set up a simplified AWS resource access for Databricks with instance profiles using Pulumi, you'll need to:

    1. Create an IAM role with the necessary policies attached.
    2. Create an instance profile and associate the IAM role with this profile.
    3. Provide the instance profile to Databricks clusters, so they can use it to access AWS resources.

    Below is a Pulumi program written in Python that demonstrates how to create an IAM role, attach policies to it, and associate it with an instance profile. Later, you can use the Databricks.InstanceProfile resource to link this AWS instance profile with Databricks.

    import pulumi import pulumi_aws as aws import pulumi_databricks as databricks # Create an IAM role for Databricks to access AWS services databricks_role = aws.iam.Role("databricksIAMRole", assume_role_policy="""{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": {"Service": "ecs-tasks.amazonaws.com"}, "Action": "sts:AssumeRole" }] }""" ) # Attach the necessary policy to the role. This can be an AWS managed policy or a custom one. # For example purposes, we use AmazonS3ReadOnlyAccess managed policy here. policy_attachment = aws.iam.RolePolicyAttachment("databricksS3ReadAccess", role=databricks_role.name, policy_arn="arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess" ) # Create an IAM instance profile and associate the role instance_profile = aws.iam.InstanceProfile("databricksInstanceProfile", role=databricks_role.name ) # Link the instance profile with Databricks databricks_instance_profile = databricks.InstanceProfile("myInstanceProfile", instance_profile_arn=instance_profile.arn, skip_validation=True # Set True if you don't want to validate if the role existing directly in the AWS account ) # Export the instance profile ARN to use later or in other Pulumi programs pulumi.export('instance_profile_arn', instance_profile.arn)


    In the above program:

    • We first create an IAM role (databricksIAMRole) with an assume role policy document which allows the ECS tasks to assume this role.
    • We attach the AmazonS3ReadOnlyAccess managed policy to the role we created. This step determines what AWS resources the role can access. You would attach different policies based on the actual requirements.
    • We then create an instance profile (databricksInstanceProfile) and associate the IAM role with it.
    • Finally, we create a reference to the instance profile within Databricks using databricks.InstanceProfile. Note the skip_validation field set to True, which indicates that Pulumi should not validate the IAM role's existence in the AWS account. This is useful when setting up IAM resources manually outside of the Pulumi script.
    • The program exports the instance_profile_arn which is the Amazon Resource Name for the instance profile we created. This ARN can be used to configure Databricks clusters to use the created instance profile for accessing AWS resources.

    By running this Pulumi program, you will have the necessary AWS IAM setup for Databricks to access AWS resources using the simplest approach possible.

    Remember to review IAM permissions and adjust the policies attached to the role accordingly to ensure the least privilege principle is followed, granting only the permissions required for the tasks the Databricks clusters need to perform.