Automating Databricks Workspace Provisioning with Service Principals

Question

Pulumi · Accepted Answer

To automate the provisioning of a Databricks workspace with service principals, you'll generally go through several steps:

Create a Service Principal - This is an identity created for use with applications, hosted services, and automated tools to access Azure resources. This step may involve creating an application registration in Azure AD, then creating a service principal for that application.
Assign Roles and Permissions - After creating the service principal, you will need to assign it the necessary roles and permissions that would allow it to manage the Databricks workspace. This may include assigning roles like Contributor or Owner to the service principal for the specific Azure Subscription or Resource Group where the Databricks workspace will be deployed.
Provision the Databricks Workspace - Once you have a service principal with the necessary permissions, you can use it to provision a new Databricks workspace by setting it as the principal to interact with the Azure management API.
Configure the Databricks Workspace - Configure your workspace according to your organization's requirements, including network settings, storage, compute resources, and more.

Below is a Pulumi program written in Python that illustrates how you could automate the setup of a Databricks workspace using service principals. Note that the actual implementation may vary based on your specific Azure subscription, tenant setup, and the permissions you intend to grant.

import pulumi
import pulumi_azuread as azuread
import pulumi_databricks as databricks

# Create an Azure AD application for the Databricks Service Principal
ad_app = azuread.Application("databricks-app",
    display_name="DatabricksWorkspaceApp"
)

# Create a service principal for the Azure AD application
service_principal = azuread.ServicePrincipal("databricks-sp",
    application_id=ad_app.application_id
)

# Create a password for the Service Principal
sp_password = azuread.ServicePrincipalPassword("databricks-sp-password",
    service_principal_id=service_principal.id,
    end_date="2099-01-01T00:00:00Z"
)

# Example of assigning a role to the Service Principal (not needed for Azure Databricks itself, but might be required for other resources)
# role_assignment = azure.AuthorizationRoleAssignment("databricks-sp-role-assignment",
#    principal_id=service_principal.id,
#    role_definition_name="Contributor", # or another role suitable for your workspace requirements
#    scope=pulumi.Output.concat("/subscriptions/", azure.config.subscription_id) # the scope could be the subscription or a resource group
# )

# Provision the Databricks workspace using the service principal
workspace = databricks.Workspace("databricks-workspace",
    sku="standard", # There are various SKUs available: standard, premium, etc.
    managed_resource_group="example-managed-rg",
    parameters={
        "customVirtualNetworkId": "/subscriptions/<subscription-id>/resourceGroups/<vnet-resource-group>/providers/Microsoft.Network/virtualNetworks/<vnet-name>",
        "customPublicSubnetName": "<public-subnet-name>",
        "customPrivateSubnetName": "<private-subnet-name>"
    }
)

# (Optional) Set any configurations on the Databricks workspace if needed

# Export the configuration
pulumi.export("workspaceUrl", workspace.workspace_url)
pulumi.export("servicePrincipalId", service_principal.id)

In this program:

We create an Azure AD application (azuread.Application) which represents the identity of the application in Azure.
A service principal (azuread.ServicePrincipal) is then created for this AD application. This principal is what will have permissions within Azure to manage resources.
We create a password for the service principal (azuread.ServicePrincipalPassword) which can be used in automated tools or other parts of your system that interact with Azure Databricks.
(Commented out) We assign a role to the service principal using azure.AuthorizationRoleAssignment. This step might be necessary if the service principal needs permissions to access or manage other Azure resources.
Finally, we provision a Databricks workspace (databricks.Workspace) using the service principal where you define the SKU and any necessary parameters, such as network configuration if you're deploying into a custom VNet.

The pulumi.export statements are used to output the URL of the Databricks workspace and the ID of the service principal so that they can be easily accessed outside of Pulumi, such as in CI/CD pipelines or other automation tools.

Remember to replace placeholder values (like <subscription-id>, <vnet-resource-group>, etc.) with actual values from your Azure environment. The specific roles and scope will also depend on the level of access you wish the service principal to have.