Monitoring Databricks Jobs and Performance with MWS Logs.

Question

Pulumi · Accepted Answer

To monitor Databricks jobs and performance using MWS logs, you can utilize the `databricks.MwsLogDelivery` resource, which allows you to configure the delivery of logs to a specific destination for analysis and monitoring.

In our Pulumi program, we will configure the delivery of log data from Databricks by setting up the necessary components:
- `databricks.MwsLogDelivery`: This resource will be used to configure the log delivery settings. You need to specify the type of logs, the output format, and the destination where the logs will be delivered.
- `databricks.MwsCredentials`: Before setting up log delivery, we need credentials that the log delivery service will use to write logs to the destination (in this example, an S3 bucket). This resource represents the necessary credentials.
- `databricks.MwsStorageConfigurations`: This resource represents the storage configuration in AWS where the logs will be delivered. It ties together the credentials and the S3 bucket information needed for log delivery.

Here’s a Pulumi program written in Python that achieves this. The program assumes you already have an Amazon S3 bucket configured where the logs should be delivered. This is a simplified example to give you a starting point. You may need to replace placeholder values with actual values from your Databricks and AWS environments.

```python
import pulumi
import pulumi_databricks as databricks

# Set up the necessary AWS credentials for log delivery
mws_credentials = databricks.MwsCredentials("mws-credentials",
    account_id="your-databricks-account-id",
    role_arn="arn:aws:iam::123456789012:role/DatabricksLogDeliveryRole",
    credentials_name="MyDatabricksCredentials")

# Define the storage configuration where the logs will be delivered
# You need to have an S3 bucket ready where the logs will be stored.
mws_storage_configuration = databricks.MwsStorageConfigurations("mws-storage-configuration",
    account_id="your-databricks-account-id",
    storage_configuration_name="MyDatabricksStorageConfiguration",
    root_bucket_info=pulumi.Output.all(mws_credentials).apply(
        lambda creds: databricks.MwsStorageConfigurationsRootBucketInfoArgs(
            bucket_name="my-databricks-logs-bucket",
            credentials_id=creds.id
        )
    ))

# Configure log delivery for Databricks MWS
mws_log_delivery = databricks.MwsLogDelivery("mws-log-delivery",
    account_id="your-databricks-account-id",
    log_type="AUDIT_LOGS",
    output_format="JSON",
    config_id="your-databricks-config-id",
    delivery_path_prefix="logs/audit",
    credentials_id=mws_credentials.id,
    delivery_start_time="2023-01-01T00:00:00Z",
    storage_configuration_id=mws_storage_configuration.id)

# Export the IDs for the resources created which might be useful for future configurations
pulumi.export('credentials_id', mws_credentials.id)
pulumi.export('storage_configuration_id', mws_storage_configuration.id)
pulumi.export('log_delivery_id', mws_log_delivery.id)
```

In the above program:
- Replace `your-databricks-account-id` with the ID of your Databricks account.
- Replace `arn:aws:iam::123456789012:role/DatabricksLogDeliveryRole` with the ARN of the IAM role that has the necessary permissions to write to your S3 bucket.
- Replace `your-databricks-config-id` with the configuration ID specific to your Databricks setup.
- You may need to adjust the `delivery_path_prefix` or `delivery_start_time` according to your organizational requirements.

Remember that proper IAM roles and policies must be set up in your AWS account to allow Databricks to write logs to your S3 bucket.

You can find more information about these resources in the Pulumi documentation:
- [`databricks.MwsLogDelivery`](https://www.pulumi.com/registry/packages/databricks/api-docs/mwslogdelivery/)
- [`databricks.MwsCredentials`](https://www.pulumi.com/registry/packages/databricks/api-docs/mwscredentials/)
- [`databricks.MwsStorageConfigurations`](https://www.pulumi.com/registry/packages/databricks/api-docs/mwsstorageconfigurations/)

Using Pulumi to configure the log delivery allows you to manage and version the infrastructure as part of your code. This practice enables you to apply software engineering principles to infrastructure management, such as code reviews, version control, and CI/CD for infrastructure changes.