Automating Databricks Cluster Access with Group Instance Profiles

Question

Pulumi · Accepted Answer

To automate Databricks Cluster Access with Group Instance Profiles in Pulumi, you would use the following resources:

- `databricks.InstanceProfile`: This represents an AWS IAM Instance Profile which will be associated with a Databricks cluster to enable clusters access to various AWS services. You create an instance profile in AWS and then register it with Databricks using this resource.
- `databricks.Group`: This represents a group within Databricks. Groups are used to manage access to resources and organize users.
- `databricks.GroupInstanceProfile`: Associates a Databricks group with an instance profile, giving all users in the group the ability to launch clusters using that instance profile.

Here's how you could use these resources in a Pulumi Python program. This program assumes that you have already configured AWS and Databricks providers in Pulumi and have the necessary permissions to create and manage resources in both AWS and Databricks.

```python
import pulumi
import pulumi_aws as aws
import pulumi_databricks as databricks

# First, an IAM role for the EC2 instances where the Databricks clusters will run is created.
ec2_role = aws.iam.Role("ec2Role",
    assume_role_policy={
        "Version": "2012-10-17",
        "Statement": [{
            "Action": "sts:AssumeRole",
            "Effect": "Allow",
            "Principal": {"Service": "ec2.amazonaws.com"},
        }],
    })

# Then, a policy is attached to the role, granting necessary permissions for the clusters.
# Make sure to customize the policy according to the access level required for your use case.
policy_arn = "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess" # Example policy ARN, change it to your needs
aws.iam.RolePolicyAttachment("rolePolicyAttachment",
    role=ec2_role.name, policy_arn=policy_arn)

# Then, you create an IAM instance profile and attach the role to it.
instance_profile = aws.iam.InstanceProfile("instanceProfile", role=ec2_role.name)

# In Databricks, register the IAM Instance Profile.
databricks_instance_profile = databricks.InstanceProfile("databricksInstanceProfile",
    instance_profile_arn=instance_profile.arn.apply(lambda arn: arn),
    skip_validation=True)

# Create a Databricks group. Here, we manage a group with a display name 'data-scientists-group'.
databricks_group = databricks.Group("dataScientistsGroup",
    display_name="data-scientists-group")

# Finally, associate the group with the instance profile.
group_instance_profile = databricks.GroupInstanceProfile("groupInstanceProfile",
    group_id=databricks_group.id,
    instance_profile_id=databricks_instance_profile.id)

# Export the group and instance profile ARNs so they can be used outside of Pulumi.
pulumi.export("group_id", databricks_group.id)
pulumi.export("instance_profile_id", databricks_instance_profile.id)
```

This program sets up an IAM role and policy, attaches them to an IAM instance profile, creates a Databricks group, and associates the group with the instance profile. The role's policy determines what AWS resources the cluster can access when running. The cluster itself will be managed through Databricks, but it will use the instance profile that you have linked above.

Please note that you need to substitute `"arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"` with the actual ARN of the IAM policy that you want to apply for the correct permissions, and you may want to specify additional configuration for the IAM role and the instance profile as necessary for your setup.

When this Pulumi program runs, it will ensure that all users within the 'data-scientists-group' in Databricks can create clusters with access configured via the 'ec2Role' IAM role.