Fine-Grained Permissions for Databricks Notebooks and Experiments

Question

Pulumi · Accepted Answer

Fine-grained permissions in the context of Databricks allow you to control user and group access to specific Databricks resources such as notebooks, experiments, and data. To implement such permissions, you typically interact with Databricks' access control lists (ACLs) via their APIs or UI, but since Pulumi does not directly integrate with Databricks' fine-grained access control system, you won't find specific Pulumi resources for Databricks ACLs.

However, we can use Pulumi to provision a Databricks workspace and manage cloud resources that Databricks integrates with, like cloud storage or compute instances. This way, you lay the foundation upon which fine-grained permissions can be manually configured through the Databricks platform itself or through APIs.

Let's see how to provision a Databricks workspace with Pulumi. In this example, we'll be creating a Databricks workspace using the AWS cloud provider, but bear in mind that you'll need to manually set up fine-grained permissions in the Databricks workspace after it's created.

```python
import pulumi
import pulumi_aws as aws

# Create a new AWS VPC for the Databricks workspace
vpc = aws.ec2.Vpc("databricks-vpc",
    cidr_block="10.0.0.0/16",
    enable_dns_hostnames=True,
    tags={
        "Name": "pulumi-databricks-vpc",
    })

# Create an internet gateway and attach it to the VPC
internet_gateway = aws.ec2.InternetGateway("vpc-igw",
    vpc_id=vpc.id,
    tags={
        "Name": "pulumi-databricks-igw",
    })

# Use the Pulumi AWS SDK to create subnet, security group, etc., if necessary

# Now, we need to provision the Databricks workspace
# Pulumi doesn't natively support Databricks resources, so we'll
# use the `aws_iam` module to create the necessary IAM role that
# allows AWS to manage resources for Databricks on our behalf.

# IAM Role to allow Databricks to manage resources
databricks_service_role = aws.iam.Role("databricks-service-role",
    assume_role_policy=json.dumps({
        "Version": "2012-10-17",
        "Statement": [{
            "Effect": "Allow",
            "Principal": {
                "Service": "databricks.amazonaws.com",
            },
            "Action": "sts:AssumeRole",
        }]
    }))

# Attach policies to the service role as required by Databricks
aws.iam.RolePolicyAttachment("databricks-policy-attachment",
    policy_arn="arn:aws:iam::aws:policy/AmazonS3FullAccess",  # Example policy ARN, adjust as necessary
    role=databricks_service_role.name)

# Output the IAM role arn to be used for configuring Databricks workspace
pulumi.export("databricks_service_role_arn", databricks_service_role.arn)

# Note: At this point, you would typically proceed to create the Databricks
# workspace using the exported IAM role ARN. As of my knowledge cutoff in 2023,
# Databricks workspaces can be provisioned through the AWS Management Console,
# CLI, or by using the Databricks REST APIs. You may need to use a custom
# Pulumi Provider or dynamic resource to automate provisioning through APIs.
```

Remember, after the Pulumi run completes, you'll need to take the IAM role ARN exported by Pulumi (`databricks_service_role_arn`) and plug that into your Databricks workspace configuration. This is usually done through the AWS console under the Databricks workspace settings or via the Databricks REST API.

Any further steps involving the fine-grained permissions settings within Databricks will need to be managed through Databricks-specific tools or APIs directly. Pulumi automates infrastructure provisioning and management, but for specific application-layer configurations such as ACLs within Databricks, it is common to use the native tools provided by the service.