1. Unified Data Analytics with Databricks Storage Credentials


    Unified Data Analytics with Databricks Storage Credentials involves setting up credentials to securely access various storage services from within Databricks. Integrating storage solutions allows Databricks to interact with data stored in, for example, AWS S3 buckets, Azure Blob Storage, or Google Cloud Storage, for the purpose of reading and analyzing large datasets.

    Databricks itself doesn't store data; instead, it analyzes data that is stored elsewhere. To do this safely and securely, you need to supply Databricks with credentials. How you provide these credentials can vary based on the cloud provider you are using.

    Let's say you're using AWS and you want to set up a read-only storage credential that grants Databricks read-only access to an S3 bucket via an IAM role. The following Pulumi program demonstrates how you can provision such a storage credential using the pulumi_databricks package.

    Here's what each part of the program does:

    1. Import dependencies: We're importing the required Pulumi packages for Databricks and Pulumi itself.

    2. Set up the storage credential: We're creating an instance of databricks.StorageCredential and specifying the necessary parameters such as the name of the credential, the owner, the AWS IAM role, and the metastore ID. The readOnly flag is set to True, indicating that this credential will only allow read access.

    3. Export the storage credential ID: By exporting the storage_credential_id, you can reference this credential ID in other parts of your infrastructure as code.

    Here is the program:

    import pulumi import pulumi_databricks as databricks # Define the AWS IAM role that will be used by Databricks to access the S3 bucket aws_iam_role = { "roleArn": "arn:aws:iam::123456789012:role/databricks-s3-access-role" } # Create a read-only storage credential in Databricks for accessing data in S3 using the IAM role storage_credential = databricks.StorageCredential("readOnlyS3Credential", # Replace with your Databricks workspace owner user or group name owner="owner@example.com", # The following configurations specify the AWS IAM role and mark this credential as read-only awsIamRole=aws_iam_role, readOnly=True, # You must specify the metastore ID associated with your Databricks workspace metastoreId="your-metastore-id" ) # Output the storage credential ID pulumi.export("storage_credential_id", storage_credential.id)

    Remember to replace "arn:aws:iam::123456789012:role/databricks-s3-access-role" with the actual ARN of the IAM role you've configured for Databricks access to S3, "owner@example.com" with the actual owner of the Databricks workspace, and "your-metastore-id" with the actual metastore ID associated with your Databricks workspace.

    In this program, we're making use of the Databricks provider for Pulumi to provision a storage credential. This program would be part of a larger Pulumi deployment where you would also define the necessary cloud resources such as the S3 buckets themselves, potentially the IAM role as well, and other Databricks resources needed for your unified data analytics setup.

    For detailed information about the Databricks provider in Pulumi, you can refer to the Pulumi Databricks Provider documentation. Pulumi's programming model allows you to define such credentials in a declarative manner, enabling secure, auditable, and reproducible deployments.

    Make sure to run pulumi up to deploy this configuration and provision the resources in your own Databricks environment. The output of the command will show you the ID of the created read-only storage credential.