databricks logo
Databricks v1.10.0, Mar 15 23

databricks.Cluster

Import

The resource cluster can be imported using cluster id. bash

 $ pulumi import databricks:index/cluster:Cluster this <cluster-id>

Create Cluster Resource

new Cluster(name: string, args: ClusterArgs, opts?: CustomResourceOptions);
@overload
def Cluster(resource_name: str,
            opts: Optional[ResourceOptions] = None,
            apply_policy_default_values: Optional[bool] = None,
            autoscale: Optional[ClusterAutoscaleArgs] = None,
            autotermination_minutes: Optional[int] = None,
            aws_attributes: Optional[ClusterAwsAttributesArgs] = None,
            azure_attributes: Optional[ClusterAzureAttributesArgs] = None,
            cluster_id: Optional[str] = None,
            cluster_log_conf: Optional[ClusterClusterLogConfArgs] = None,
            cluster_mount_infos: Optional[Sequence[ClusterClusterMountInfoArgs]] = None,
            cluster_name: Optional[str] = None,
            custom_tags: Optional[Mapping[str, Any]] = None,
            data_security_mode: Optional[str] = None,
            docker_image: Optional[ClusterDockerImageArgs] = None,
            driver_instance_pool_id: Optional[str] = None,
            driver_node_type_id: Optional[str] = None,
            enable_elastic_disk: Optional[bool] = None,
            enable_local_disk_encryption: Optional[bool] = None,
            gcp_attributes: Optional[ClusterGcpAttributesArgs] = None,
            idempotency_token: Optional[str] = None,
            init_scripts: Optional[Sequence[ClusterInitScriptArgs]] = None,
            instance_pool_id: Optional[str] = None,
            is_pinned: Optional[bool] = None,
            libraries: Optional[Sequence[ClusterLibraryArgs]] = None,
            node_type_id: Optional[str] = None,
            num_workers: Optional[int] = None,
            policy_id: Optional[str] = None,
            runtime_engine: Optional[str] = None,
            single_user_name: Optional[str] = None,
            spark_conf: Optional[Mapping[str, Any]] = None,
            spark_env_vars: Optional[Mapping[str, Any]] = None,
            spark_version: Optional[str] = None,
            ssh_public_keys: Optional[Sequence[str]] = None,
            workload_type: Optional[ClusterWorkloadTypeArgs] = None)
@overload
def Cluster(resource_name: str,
            args: ClusterArgs,
            opts: Optional[ResourceOptions] = None)
func NewCluster(ctx *Context, name string, args ClusterArgs, opts ...ResourceOption) (*Cluster, error)
public Cluster(string name, ClusterArgs args, CustomResourceOptions? opts = null)
public Cluster(String name, ClusterArgs args)
public Cluster(String name, ClusterArgs args, CustomResourceOptions options)
type: databricks:Cluster
properties: # The arguments to resource properties.
options: # Bag of options to control resource's behavior.

name string
The unique name of the resource.
args ClusterArgs
The arguments to resource properties.
opts CustomResourceOptions
Bag of options to control resource's behavior.
resource_name str
The unique name of the resource.
args ClusterArgs
The arguments to resource properties.
opts ResourceOptions
Bag of options to control resource's behavior.
ctx Context
Context object for the current deployment.
name string
The unique name of the resource.
args ClusterArgs
The arguments to resource properties.
opts ResourceOption
Bag of options to control resource's behavior.
name string
The unique name of the resource.
args ClusterArgs
The arguments to resource properties.
opts CustomResourceOptions
Bag of options to control resource's behavior.
name String
The unique name of the resource.
args ClusterArgs
The arguments to resource properties.
options CustomResourceOptions
Bag of options to control resource's behavior.

Cluster Resource Properties

To learn more about resource properties and how to use them, see Inputs and Outputs in the Architecture and Concepts docs.

Inputs

The Cluster resource accepts the following input properties:

SparkVersion string

Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.

ApplyPolicyDefaultValues bool

Whether to use policy default values for missing cluster attributes.

Autoscale ClusterAutoscaleArgs
AutoterminationMinutes int

Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 60. We highly recommend having this setting present for Interactive/BI clusters.

AwsAttributes ClusterAwsAttributesArgs
AzureAttributes ClusterAzureAttributesArgs
ClusterId string
ClusterLogConf ClusterClusterLogConfArgs
ClusterMountInfos List<ClusterClusterMountInfoArgs>
ClusterName string

Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

CustomTags Dictionary<string, object>

Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to default_tags. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with an x_ when it is propagated.

DataSecurityMode string

Select the security features of the cluster. Unity Catalog requires SINGLE_USER or USER_ISOLATION mode. LEGACY_PASSTHROUGH for passthrough cluster and LEGACY_TABLE_ACL for Table ACL cluster. Default to NONE, i.e. no security feature enabled. In the Databricks UI, this has been recently been renamed Access Mode and USER_ISOLATION has been renamed Shared, but use these terms here.

DockerImage ClusterDockerImageArgs
DriverInstancePoolId string

similar to instance_pool_id, but for driver node. If omitted, and instance_pool_id is specified, then the driver will be allocated from that pool.

DriverNodeTypeId string

The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id defined above.

EnableElasticDisk bool

If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have autotermination_minutes and autoscale attributes set. More documentation available at cluster configuration page.

EnableLocalDiskEncryption bool

Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.

GcpAttributes ClusterGcpAttributesArgs
IdempotencyToken string

An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.

InitScripts List<ClusterInitScriptArgs>
InstancePoolId string

To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to TERMINATED, the instances it used are returned to the pool and reused by a different cluster.

IsPinned bool

boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 70, so apply may fail if you have more than that.

Libraries List<ClusterLibraryArgs>
NodeTypeId string

Any supported databricks.getNodeType id. If instance_pool_id is specified, this field is not needed.

NumWorkers int

Number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

PolicyId string

Identifier of Cluster Policy to validate cluster and preset certain defaults. The primary use for cluster policies is to allow users to create policy-scoped clusters via UI rather than sharing configuration for API-created clusters. For example, when you specify policy_id of external metastore policy, you still have to fill in relevant keys for spark_conf.

RuntimeEngine string

The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include: PHOTON, STANDARD.

SingleUserName string

The optional user name of the user to assign to an interactive cluster. This field is required when using standard AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).

SparkConf Dictionary<string, object>

Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.

SparkEnvVars Dictionary<string, object>

Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.

SshPublicKeys List<string>

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.

WorkloadType ClusterWorkloadTypeArgs
SparkVersion string

Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.

ApplyPolicyDefaultValues bool

Whether to use policy default values for missing cluster attributes.

Autoscale ClusterAutoscaleArgs
AutoterminationMinutes int

Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 60. We highly recommend having this setting present for Interactive/BI clusters.

AwsAttributes ClusterAwsAttributesArgs
AzureAttributes ClusterAzureAttributesArgs
ClusterId string
ClusterLogConf ClusterClusterLogConfArgs
ClusterMountInfos []ClusterClusterMountInfoArgs
ClusterName string

Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

CustomTags map[string]interface{}

Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to default_tags. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with an x_ when it is propagated.

DataSecurityMode string

Select the security features of the cluster. Unity Catalog requires SINGLE_USER or USER_ISOLATION mode. LEGACY_PASSTHROUGH for passthrough cluster and LEGACY_TABLE_ACL for Table ACL cluster. Default to NONE, i.e. no security feature enabled. In the Databricks UI, this has been recently been renamed Access Mode and USER_ISOLATION has been renamed Shared, but use these terms here.

DockerImage ClusterDockerImageArgs
DriverInstancePoolId string

similar to instance_pool_id, but for driver node. If omitted, and instance_pool_id is specified, then the driver will be allocated from that pool.

DriverNodeTypeId string

The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id defined above.

EnableElasticDisk bool

If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have autotermination_minutes and autoscale attributes set. More documentation available at cluster configuration page.

EnableLocalDiskEncryption bool

Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.

GcpAttributes ClusterGcpAttributesArgs
IdempotencyToken string

An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.

InitScripts []ClusterInitScriptArgs
InstancePoolId string

To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to TERMINATED, the instances it used are returned to the pool and reused by a different cluster.

IsPinned bool

boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 70, so apply may fail if you have more than that.

Libraries []ClusterLibraryArgs
NodeTypeId string

Any supported databricks.getNodeType id. If instance_pool_id is specified, this field is not needed.

NumWorkers int

Number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

PolicyId string

Identifier of Cluster Policy to validate cluster and preset certain defaults. The primary use for cluster policies is to allow users to create policy-scoped clusters via UI rather than sharing configuration for API-created clusters. For example, when you specify policy_id of external metastore policy, you still have to fill in relevant keys for spark_conf.

RuntimeEngine string

The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include: PHOTON, STANDARD.

SingleUserName string

The optional user name of the user to assign to an interactive cluster. This field is required when using standard AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).

SparkConf map[string]interface{}

Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.

SparkEnvVars map[string]interface{}

Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.

SshPublicKeys []string

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.

WorkloadType ClusterWorkloadTypeArgs
sparkVersion String

Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.

applyPolicyDefaultValues Boolean

Whether to use policy default values for missing cluster attributes.

autoscale ClusterAutoscaleArgs
autoterminationMinutes Integer

Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 60. We highly recommend having this setting present for Interactive/BI clusters.

awsAttributes ClusterAwsAttributesArgs
azureAttributes ClusterAzureAttributesArgs
clusterId String
clusterLogConf ClusterClusterLogConfArgs
clusterMountInfos List<ClusterClusterMountInfoArgs>
clusterName String

Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

customTags Map<String,Object>

Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to default_tags. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with an x_ when it is propagated.

dataSecurityMode String

Select the security features of the cluster. Unity Catalog requires SINGLE_USER or USER_ISOLATION mode. LEGACY_PASSTHROUGH for passthrough cluster and LEGACY_TABLE_ACL for Table ACL cluster. Default to NONE, i.e. no security feature enabled. In the Databricks UI, this has been recently been renamed Access Mode and USER_ISOLATION has been renamed Shared, but use these terms here.

dockerImage ClusterDockerImageArgs
driverInstancePoolId String

similar to instance_pool_id, but for driver node. If omitted, and instance_pool_id is specified, then the driver will be allocated from that pool.

driverNodeTypeId String

The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id defined above.

enableElasticDisk Boolean

If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have autotermination_minutes and autoscale attributes set. More documentation available at cluster configuration page.

enableLocalDiskEncryption Boolean

Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.

gcpAttributes ClusterGcpAttributesArgs
idempotencyToken String

An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.

initScripts List<ClusterInitScriptArgs>
instancePoolId String

To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to TERMINATED, the instances it used are returned to the pool and reused by a different cluster.

isPinned Boolean

boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 70, so apply may fail if you have more than that.

libraries List<ClusterLibraryArgs>
nodeTypeId String

Any supported databricks.getNodeType id. If instance_pool_id is specified, this field is not needed.

numWorkers Integer

Number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

policyId String

Identifier of Cluster Policy to validate cluster and preset certain defaults. The primary use for cluster policies is to allow users to create policy-scoped clusters via UI rather than sharing configuration for API-created clusters. For example, when you specify policy_id of external metastore policy, you still have to fill in relevant keys for spark_conf.

runtimeEngine String

The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include: PHOTON, STANDARD.

singleUserName String

The optional user name of the user to assign to an interactive cluster. This field is required when using standard AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).

sparkConf Map<String,Object>

Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.

sparkEnvVars Map<String,Object>

Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.

sshPublicKeys List<String>

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.

workloadType ClusterWorkloadTypeArgs
sparkVersion string

Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.

applyPolicyDefaultValues boolean

Whether to use policy default values for missing cluster attributes.

autoscale ClusterAutoscaleArgs
autoterminationMinutes number

Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 60. We highly recommend having this setting present for Interactive/BI clusters.

awsAttributes ClusterAwsAttributesArgs
azureAttributes ClusterAzureAttributesArgs
clusterId string
clusterLogConf ClusterClusterLogConfArgs
clusterMountInfos ClusterClusterMountInfoArgs[]
clusterName string

Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

customTags {[key: string]: any}

Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to default_tags. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with an x_ when it is propagated.

dataSecurityMode string

Select the security features of the cluster. Unity Catalog requires SINGLE_USER or USER_ISOLATION mode. LEGACY_PASSTHROUGH for passthrough cluster and LEGACY_TABLE_ACL for Table ACL cluster. Default to NONE, i.e. no security feature enabled. In the Databricks UI, this has been recently been renamed Access Mode and USER_ISOLATION has been renamed Shared, but use these terms here.

dockerImage ClusterDockerImageArgs
driverInstancePoolId string

similar to instance_pool_id, but for driver node. If omitted, and instance_pool_id is specified, then the driver will be allocated from that pool.

driverNodeTypeId string

The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id defined above.

enableElasticDisk boolean

If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have autotermination_minutes and autoscale attributes set. More documentation available at cluster configuration page.

enableLocalDiskEncryption boolean

Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.

gcpAttributes ClusterGcpAttributesArgs
idempotencyToken string

An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.

initScripts ClusterInitScriptArgs[]
instancePoolId string

To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to TERMINATED, the instances it used are returned to the pool and reused by a different cluster.

isPinned boolean

boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 70, so apply may fail if you have more than that.

libraries ClusterLibraryArgs[]
nodeTypeId string

Any supported databricks.getNodeType id. If instance_pool_id is specified, this field is not needed.

numWorkers number

Number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

policyId string

Identifier of Cluster Policy to validate cluster and preset certain defaults. The primary use for cluster policies is to allow users to create policy-scoped clusters via UI rather than sharing configuration for API-created clusters. For example, when you specify policy_id of external metastore policy, you still have to fill in relevant keys for spark_conf.

runtimeEngine string

The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include: PHOTON, STANDARD.

singleUserName string

The optional user name of the user to assign to an interactive cluster. This field is required when using standard AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).

sparkConf {[key: string]: any}

Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.

sparkEnvVars {[key: string]: any}

Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.

sshPublicKeys string[]

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.

workloadType ClusterWorkloadTypeArgs
spark_version str

Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.

apply_policy_default_values bool

Whether to use policy default values for missing cluster attributes.

autoscale ClusterAutoscaleArgs
autotermination_minutes int

Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 60. We highly recommend having this setting present for Interactive/BI clusters.

aws_attributes ClusterAwsAttributesArgs
azure_attributes ClusterAzureAttributesArgs
cluster_id str
cluster_log_conf ClusterClusterLogConfArgs
cluster_mount_infos Sequence[ClusterClusterMountInfoArgs]
cluster_name str

Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

custom_tags Mapping[str, Any]

Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to default_tags. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with an x_ when it is propagated.

data_security_mode str

Select the security features of the cluster. Unity Catalog requires SINGLE_USER or USER_ISOLATION mode. LEGACY_PASSTHROUGH for passthrough cluster and LEGACY_TABLE_ACL for Table ACL cluster. Default to NONE, i.e. no security feature enabled. In the Databricks UI, this has been recently been renamed Access Mode and USER_ISOLATION has been renamed Shared, but use these terms here.

docker_image ClusterDockerImageArgs
driver_instance_pool_id str

similar to instance_pool_id, but for driver node. If omitted, and instance_pool_id is specified, then the driver will be allocated from that pool.

driver_node_type_id str

The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id defined above.

enable_elastic_disk bool

If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have autotermination_minutes and autoscale attributes set. More documentation available at cluster configuration page.

enable_local_disk_encryption bool

Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.

gcp_attributes ClusterGcpAttributesArgs
idempotency_token str

An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.

init_scripts Sequence[ClusterInitScriptArgs]
instance_pool_id str

To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to TERMINATED, the instances it used are returned to the pool and reused by a different cluster.

is_pinned bool

boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 70, so apply may fail if you have more than that.

libraries Sequence[ClusterLibraryArgs]
node_type_id str

Any supported databricks.getNodeType id. If instance_pool_id is specified, this field is not needed.

num_workers int

Number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

policy_id str

Identifier of Cluster Policy to validate cluster and preset certain defaults. The primary use for cluster policies is to allow users to create policy-scoped clusters via UI rather than sharing configuration for API-created clusters. For example, when you specify policy_id of external metastore policy, you still have to fill in relevant keys for spark_conf.

runtime_engine str

The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include: PHOTON, STANDARD.

single_user_name str

The optional user name of the user to assign to an interactive cluster. This field is required when using standard AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).

spark_conf Mapping[str, Any]

Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.

spark_env_vars Mapping[str, Any]

Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.

ssh_public_keys Sequence[str]

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.

workload_type ClusterWorkloadTypeArgs
sparkVersion String

Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.

applyPolicyDefaultValues Boolean

Whether to use policy default values for missing cluster attributes.

autoscale Property Map
autoterminationMinutes Number

Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 60. We highly recommend having this setting present for Interactive/BI clusters.

awsAttributes Property Map
azureAttributes Property Map
clusterId String
clusterLogConf Property Map
clusterMountInfos List<Property Map>
clusterName String

Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

customTags Map<Any>

Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to default_tags. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with an x_ when it is propagated.

dataSecurityMode String

Select the security features of the cluster. Unity Catalog requires SINGLE_USER or USER_ISOLATION mode. LEGACY_PASSTHROUGH for passthrough cluster and LEGACY_TABLE_ACL for Table ACL cluster. Default to NONE, i.e. no security feature enabled. In the Databricks UI, this has been recently been renamed Access Mode and USER_ISOLATION has been renamed Shared, but use these terms here.

dockerImage Property Map
driverInstancePoolId String

similar to instance_pool_id, but for driver node. If omitted, and instance_pool_id is specified, then the driver will be allocated from that pool.

driverNodeTypeId String

The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id defined above.

enableElasticDisk Boolean

If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have autotermination_minutes and autoscale attributes set. More documentation available at cluster configuration page.

enableLocalDiskEncryption Boolean

Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.

gcpAttributes Property Map
idempotencyToken String

An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.

initScripts List<Property Map>
instancePoolId String

To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to TERMINATED, the instances it used are returned to the pool and reused by a different cluster.

isPinned Boolean

boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 70, so apply may fail if you have more than that.

libraries List<Property Map>
nodeTypeId String

Any supported databricks.getNodeType id. If instance_pool_id is specified, this field is not needed.

numWorkers Number

Number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

policyId String

Identifier of Cluster Policy to validate cluster and preset certain defaults. The primary use for cluster policies is to allow users to create policy-scoped clusters via UI rather than sharing configuration for API-created clusters. For example, when you specify policy_id of external metastore policy, you still have to fill in relevant keys for spark_conf.

runtimeEngine String

The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include: PHOTON, STANDARD.

singleUserName String

The optional user name of the user to assign to an interactive cluster. This field is required when using standard AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).

sparkConf Map<Any>

Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.

sparkEnvVars Map<Any>

Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.

sshPublicKeys List<String>

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.

workloadType Property Map

Outputs

All input properties are implicitly available as output properties. Additionally, the Cluster resource produces the following output properties:

DefaultTags Dictionary<string, object>

(map) Tags that are added by Databricks by default, regardless of any custom_tags that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.

Id string

The provider-assigned unique ID for this managed resource.

State string

(string) State of the cluster.

Url string
DefaultTags map[string]interface{}

(map) Tags that are added by Databricks by default, regardless of any custom_tags that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.

Id string

The provider-assigned unique ID for this managed resource.

State string

(string) State of the cluster.

Url string
defaultTags Map<String,Object>

(map) Tags that are added by Databricks by default, regardless of any custom_tags that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.

id String

The provider-assigned unique ID for this managed resource.

state String

(string) State of the cluster.

url String
defaultTags {[key: string]: any}

(map) Tags that are added by Databricks by default, regardless of any custom_tags that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.

id string

The provider-assigned unique ID for this managed resource.

state string

(string) State of the cluster.

url string
default_tags Mapping[str, Any]

(map) Tags that are added by Databricks by default, regardless of any custom_tags that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.

id str

The provider-assigned unique ID for this managed resource.

state str

(string) State of the cluster.

url str
defaultTags Map<Any>

(map) Tags that are added by Databricks by default, regardless of any custom_tags that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.

id String

The provider-assigned unique ID for this managed resource.

state String

(string) State of the cluster.

url String

Look up Existing Cluster Resource

Get an existing Cluster resource’s state with the given name, ID, and optional extra properties used to qualify the lookup.

public static get(name: string, id: Input<ID>, state?: ClusterState, opts?: CustomResourceOptions): Cluster
@staticmethod
def get(resource_name: str,
        id: str,
        opts: Optional[ResourceOptions] = None,
        apply_policy_default_values: Optional[bool] = None,
        autoscale: Optional[ClusterAutoscaleArgs] = None,
        autotermination_minutes: Optional[int] = None,
        aws_attributes: Optional[ClusterAwsAttributesArgs] = None,
        azure_attributes: Optional[ClusterAzureAttributesArgs] = None,
        cluster_id: Optional[str] = None,
        cluster_log_conf: Optional[ClusterClusterLogConfArgs] = None,
        cluster_mount_infos: Optional[Sequence[ClusterClusterMountInfoArgs]] = None,
        cluster_name: Optional[str] = None,
        custom_tags: Optional[Mapping[str, Any]] = None,
        data_security_mode: Optional[str] = None,
        default_tags: Optional[Mapping[str, Any]] = None,
        docker_image: Optional[ClusterDockerImageArgs] = None,
        driver_instance_pool_id: Optional[str] = None,
        driver_node_type_id: Optional[str] = None,
        enable_elastic_disk: Optional[bool] = None,
        enable_local_disk_encryption: Optional[bool] = None,
        gcp_attributes: Optional[ClusterGcpAttributesArgs] = None,
        idempotency_token: Optional[str] = None,
        init_scripts: Optional[Sequence[ClusterInitScriptArgs]] = None,
        instance_pool_id: Optional[str] = None,
        is_pinned: Optional[bool] = None,
        libraries: Optional[Sequence[ClusterLibraryArgs]] = None,
        node_type_id: Optional[str] = None,
        num_workers: Optional[int] = None,
        policy_id: Optional[str] = None,
        runtime_engine: Optional[str] = None,
        single_user_name: Optional[str] = None,
        spark_conf: Optional[Mapping[str, Any]] = None,
        spark_env_vars: Optional[Mapping[str, Any]] = None,
        spark_version: Optional[str] = None,
        ssh_public_keys: Optional[Sequence[str]] = None,
        state: Optional[str] = None,
        url: Optional[str] = None,
        workload_type: Optional[ClusterWorkloadTypeArgs] = None) -> Cluster
func GetCluster(ctx *Context, name string, id IDInput, state *ClusterState, opts ...ResourceOption) (*Cluster, error)
public static Cluster Get(string name, Input<string> id, ClusterState? state, CustomResourceOptions? opts = null)
public static Cluster get(String name, Output<String> id, ClusterState state, CustomResourceOptions options)
Resource lookup is not supported in YAML
name
The unique name of the resulting resource.
id
The unique provider ID of the resource to lookup.
state
Any extra arguments used during the lookup.
opts
A bag of options that control this resource's behavior.
resource_name
The unique name of the resulting resource.
id
The unique provider ID of the resource to lookup.
name
The unique name of the resulting resource.
id
The unique provider ID of the resource to lookup.
state
Any extra arguments used during the lookup.
opts
A bag of options that control this resource's behavior.
name
The unique name of the resulting resource.
id
The unique provider ID of the resource to lookup.
state
Any extra arguments used during the lookup.
opts
A bag of options that control this resource's behavior.
name
The unique name of the resulting resource.
id
The unique provider ID of the resource to lookup.
state
Any extra arguments used during the lookup.
opts
A bag of options that control this resource's behavior.
The following state arguments are supported:
ApplyPolicyDefaultValues bool

Whether to use policy default values for missing cluster attributes.

Autoscale ClusterAutoscaleArgs
AutoterminationMinutes int

Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 60. We highly recommend having this setting present for Interactive/BI clusters.

AwsAttributes ClusterAwsAttributesArgs
AzureAttributes ClusterAzureAttributesArgs
ClusterId string
ClusterLogConf ClusterClusterLogConfArgs
ClusterMountInfos List<ClusterClusterMountInfoArgs>
ClusterName string

Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

CustomTags Dictionary<string, object>

Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to default_tags. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with an x_ when it is propagated.

DataSecurityMode string

Select the security features of the cluster. Unity Catalog requires SINGLE_USER or USER_ISOLATION mode. LEGACY_PASSTHROUGH for passthrough cluster and LEGACY_TABLE_ACL for Table ACL cluster. Default to NONE, i.e. no security feature enabled. In the Databricks UI, this has been recently been renamed Access Mode and USER_ISOLATION has been renamed Shared, but use these terms here.

DefaultTags Dictionary<string, object>

(map) Tags that are added by Databricks by default, regardless of any custom_tags that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.

DockerImage ClusterDockerImageArgs
DriverInstancePoolId string

similar to instance_pool_id, but for driver node. If omitted, and instance_pool_id is specified, then the driver will be allocated from that pool.

DriverNodeTypeId string

The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id defined above.

EnableElasticDisk bool

If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have autotermination_minutes and autoscale attributes set. More documentation available at cluster configuration page.

EnableLocalDiskEncryption bool

Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.

GcpAttributes ClusterGcpAttributesArgs
IdempotencyToken string

An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.

InitScripts List<ClusterInitScriptArgs>
InstancePoolId string

To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to TERMINATED, the instances it used are returned to the pool and reused by a different cluster.

IsPinned bool

boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 70, so apply may fail if you have more than that.

Libraries List<ClusterLibraryArgs>
NodeTypeId string

Any supported databricks.getNodeType id. If instance_pool_id is specified, this field is not needed.

NumWorkers int

Number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

PolicyId string

Identifier of Cluster Policy to validate cluster and preset certain defaults. The primary use for cluster policies is to allow users to create policy-scoped clusters via UI rather than sharing configuration for API-created clusters. For example, when you specify policy_id of external metastore policy, you still have to fill in relevant keys for spark_conf.

RuntimeEngine string

The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include: PHOTON, STANDARD.

SingleUserName string

The optional user name of the user to assign to an interactive cluster. This field is required when using standard AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).

SparkConf Dictionary<string, object>

Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.

SparkEnvVars Dictionary<string, object>

Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.

SparkVersion string

Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.

SshPublicKeys List<string>

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.

State string

(string) State of the cluster.

Url string
WorkloadType ClusterWorkloadTypeArgs
ApplyPolicyDefaultValues bool

Whether to use policy default values for missing cluster attributes.

Autoscale ClusterAutoscaleArgs
AutoterminationMinutes int

Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 60. We highly recommend having this setting present for Interactive/BI clusters.

AwsAttributes ClusterAwsAttributesArgs
AzureAttributes ClusterAzureAttributesArgs
ClusterId string
ClusterLogConf ClusterClusterLogConfArgs
ClusterMountInfos []ClusterClusterMountInfoArgs
ClusterName string

Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

CustomTags map[string]interface{}

Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to default_tags. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with an x_ when it is propagated.

DataSecurityMode string

Select the security features of the cluster. Unity Catalog requires SINGLE_USER or USER_ISOLATION mode. LEGACY_PASSTHROUGH for passthrough cluster and LEGACY_TABLE_ACL for Table ACL cluster. Default to NONE, i.e. no security feature enabled. In the Databricks UI, this has been recently been renamed Access Mode and USER_ISOLATION has been renamed Shared, but use these terms here.

DefaultTags map[string]interface{}

(map) Tags that are added by Databricks by default, regardless of any custom_tags that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.

DockerImage ClusterDockerImageArgs
DriverInstancePoolId string

similar to instance_pool_id, but for driver node. If omitted, and instance_pool_id is specified, then the driver will be allocated from that pool.

DriverNodeTypeId string

The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id defined above.

EnableElasticDisk bool

If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have autotermination_minutes and autoscale attributes set. More documentation available at cluster configuration page.

EnableLocalDiskEncryption bool

Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.

GcpAttributes ClusterGcpAttributesArgs
IdempotencyToken string

An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.

InitScripts []ClusterInitScriptArgs
InstancePoolId string

To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to TERMINATED, the instances it used are returned to the pool and reused by a different cluster.

IsPinned bool

boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 70, so apply may fail if you have more than that.

Libraries []ClusterLibraryArgs
NodeTypeId string

Any supported databricks.getNodeType id. If instance_pool_id is specified, this field is not needed.

NumWorkers int

Number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

PolicyId string

Identifier of Cluster Policy to validate cluster and preset certain defaults. The primary use for cluster policies is to allow users to create policy-scoped clusters via UI rather than sharing configuration for API-created clusters. For example, when you specify policy_id of external metastore policy, you still have to fill in relevant keys for spark_conf.

RuntimeEngine string

The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include: PHOTON, STANDARD.

SingleUserName string

The optional user name of the user to assign to an interactive cluster. This field is required when using standard AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).

SparkConf map[string]interface{}

Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.

SparkEnvVars map[string]interface{}

Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.

SparkVersion string

Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.

SshPublicKeys []string

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.

State string

(string) State of the cluster.

Url string
WorkloadType ClusterWorkloadTypeArgs
applyPolicyDefaultValues Boolean

Whether to use policy default values for missing cluster attributes.

autoscale ClusterAutoscaleArgs
autoterminationMinutes Integer

Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 60. We highly recommend having this setting present for Interactive/BI clusters.

awsAttributes ClusterAwsAttributesArgs
azureAttributes ClusterAzureAttributesArgs
clusterId String
clusterLogConf ClusterClusterLogConfArgs
clusterMountInfos List<ClusterClusterMountInfoArgs>
clusterName String

Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

customTags Map<String,Object>

Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to default_tags. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with an x_ when it is propagated.

dataSecurityMode String

Select the security features of the cluster. Unity Catalog requires SINGLE_USER or USER_ISOLATION mode. LEGACY_PASSTHROUGH for passthrough cluster and LEGACY_TABLE_ACL for Table ACL cluster. Default to NONE, i.e. no security feature enabled. In the Databricks UI, this has been recently been renamed Access Mode and USER_ISOLATION has been renamed Shared, but use these terms here.

defaultTags Map<String,Object>

(map) Tags that are added by Databricks by default, regardless of any custom_tags that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.

dockerImage ClusterDockerImageArgs
driverInstancePoolId String

similar to instance_pool_id, but for driver node. If omitted, and instance_pool_id is specified, then the driver will be allocated from that pool.

driverNodeTypeId String

The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id defined above.

enableElasticDisk Boolean

If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have autotermination_minutes and autoscale attributes set. More documentation available at cluster configuration page.

enableLocalDiskEncryption Boolean

Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.

gcpAttributes ClusterGcpAttributesArgs
idempotencyToken String

An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.

initScripts List<ClusterInitScriptArgs>
instancePoolId String

To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to TERMINATED, the instances it used are returned to the pool and reused by a different cluster.

isPinned Boolean

boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 70, so apply may fail if you have more than that.

libraries List<ClusterLibraryArgs>
nodeTypeId String

Any supported databricks.getNodeType id. If instance_pool_id is specified, this field is not needed.

numWorkers Integer

Number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

policyId String

Identifier of Cluster Policy to validate cluster and preset certain defaults. The primary use for cluster policies is to allow users to create policy-scoped clusters via UI rather than sharing configuration for API-created clusters. For example, when you specify policy_id of external metastore policy, you still have to fill in relevant keys for spark_conf.

runtimeEngine String

The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include: PHOTON, STANDARD.

singleUserName String

The optional user name of the user to assign to an interactive cluster. This field is required when using standard AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).

sparkConf Map<String,Object>

Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.

sparkEnvVars Map<String,Object>

Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.

sparkVersion String

Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.

sshPublicKeys List<String>

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.

state String

(string) State of the cluster.

url String
workloadType ClusterWorkloadTypeArgs
applyPolicyDefaultValues boolean

Whether to use policy default values for missing cluster attributes.

autoscale ClusterAutoscaleArgs
autoterminationMinutes number

Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 60. We highly recommend having this setting present for Interactive/BI clusters.

awsAttributes ClusterAwsAttributesArgs
azureAttributes ClusterAzureAttributesArgs
clusterId string
clusterLogConf ClusterClusterLogConfArgs
clusterMountInfos ClusterClusterMountInfoArgs[]
clusterName string

Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

customTags {[key: string]: any}

Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to default_tags. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with an x_ when it is propagated.

dataSecurityMode string

Select the security features of the cluster. Unity Catalog requires SINGLE_USER or USER_ISOLATION mode. LEGACY_PASSTHROUGH for passthrough cluster and LEGACY_TABLE_ACL for Table ACL cluster. Default to NONE, i.e. no security feature enabled. In the Databricks UI, this has been recently been renamed Access Mode and USER_ISOLATION has been renamed Shared, but use these terms here.

defaultTags {[key: string]: any}

(map) Tags that are added by Databricks by default, regardless of any custom_tags that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.

dockerImage ClusterDockerImageArgs
driverInstancePoolId string

similar to instance_pool_id, but for driver node. If omitted, and instance_pool_id is specified, then the driver will be allocated from that pool.

driverNodeTypeId string

The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id defined above.

enableElasticDisk boolean

If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have autotermination_minutes and autoscale attributes set. More documentation available at cluster configuration page.

enableLocalDiskEncryption boolean

Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.

gcpAttributes ClusterGcpAttributesArgs
idempotencyToken string

An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.

initScripts ClusterInitScriptArgs[]
instancePoolId string

To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to TERMINATED, the instances it used are returned to the pool and reused by a different cluster.

isPinned boolean

boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 70, so apply may fail if you have more than that.

libraries ClusterLibraryArgs[]
nodeTypeId string

Any supported databricks.getNodeType id. If instance_pool_id is specified, this field is not needed.

numWorkers number

Number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

policyId string

Identifier of Cluster Policy to validate cluster and preset certain defaults. The primary use for cluster policies is to allow users to create policy-scoped clusters via UI rather than sharing configuration for API-created clusters. For example, when you specify policy_id of external metastore policy, you still have to fill in relevant keys for spark_conf.

runtimeEngine string

The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include: PHOTON, STANDARD.

singleUserName string

The optional user name of the user to assign to an interactive cluster. This field is required when using standard AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).

sparkConf {[key: string]: any}

Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.

sparkEnvVars {[key: string]: any}

Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.

sparkVersion string

Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.

sshPublicKeys string[]

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.

state string

(string) State of the cluster.

url string
workloadType ClusterWorkloadTypeArgs
apply_policy_default_values bool

Whether to use policy default values for missing cluster attributes.

autoscale ClusterAutoscaleArgs
autotermination_minutes int

Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 60. We highly recommend having this setting present for Interactive/BI clusters.

aws_attributes ClusterAwsAttributesArgs
azure_attributes ClusterAzureAttributesArgs
cluster_id str
cluster_log_conf ClusterClusterLogConfArgs
cluster_mount_infos Sequence[ClusterClusterMountInfoArgs]
cluster_name str

Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

custom_tags Mapping[str, Any]

Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to default_tags. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with an x_ when it is propagated.

data_security_mode str

Select the security features of the cluster. Unity Catalog requires SINGLE_USER or USER_ISOLATION mode. LEGACY_PASSTHROUGH for passthrough cluster and LEGACY_TABLE_ACL for Table ACL cluster. Default to NONE, i.e. no security feature enabled. In the Databricks UI, this has been recently been renamed Access Mode and USER_ISOLATION has been renamed Shared, but use these terms here.

default_tags Mapping[str, Any]

(map) Tags that are added by Databricks by default, regardless of any custom_tags that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.

docker_image ClusterDockerImageArgs
driver_instance_pool_id str

similar to instance_pool_id, but for driver node. If omitted, and instance_pool_id is specified, then the driver will be allocated from that pool.

driver_node_type_id str

The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id defined above.

enable_elastic_disk bool

If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have autotermination_minutes and autoscale attributes set. More documentation available at cluster configuration page.

enable_local_disk_encryption bool

Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.

gcp_attributes ClusterGcpAttributesArgs
idempotency_token str

An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.

init_scripts Sequence[ClusterInitScriptArgs]
instance_pool_id str

To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to TERMINATED, the instances it used are returned to the pool and reused by a different cluster.

is_pinned bool

boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 70, so apply may fail if you have more than that.

libraries Sequence[ClusterLibraryArgs]
node_type_id str

Any supported databricks.getNodeType id. If instance_pool_id is specified, this field is not needed.

num_workers int

Number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

policy_id str

Identifier of Cluster Policy to validate cluster and preset certain defaults. The primary use for cluster policies is to allow users to create policy-scoped clusters via UI rather than sharing configuration for API-created clusters. For example, when you specify policy_id of external metastore policy, you still have to fill in relevant keys for spark_conf.

runtime_engine str

The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include: PHOTON, STANDARD.

single_user_name str

The optional user name of the user to assign to an interactive cluster. This field is required when using standard AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).

spark_conf Mapping[str, Any]

Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.

spark_env_vars Mapping[str, Any]

Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.

spark_version str

Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.

ssh_public_keys Sequence[str]

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.

state str

(string) State of the cluster.

url str
workload_type ClusterWorkloadTypeArgs
applyPolicyDefaultValues Boolean

Whether to use policy default values for missing cluster attributes.

autoscale Property Map
autoterminationMinutes Number

Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 60. We highly recommend having this setting present for Interactive/BI clusters.

awsAttributes Property Map
azureAttributes Property Map
clusterId String
clusterLogConf Property Map
clusterMountInfos List<Property Map>
clusterName String

Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

customTags Map<Any>

Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to default_tags. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with an x_ when it is propagated.

dataSecurityMode String

Select the security features of the cluster. Unity Catalog requires SINGLE_USER or USER_ISOLATION mode. LEGACY_PASSTHROUGH for passthrough cluster and LEGACY_TABLE_ACL for Table ACL cluster. Default to NONE, i.e. no security feature enabled. In the Databricks UI, this has been recently been renamed Access Mode and USER_ISOLATION has been renamed Shared, but use these terms here.

defaultTags Map<Any>

(map) Tags that are added by Databricks by default, regardless of any custom_tags that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.

dockerImage Property Map
driverInstancePoolId String

similar to instance_pool_id, but for driver node. If omitted, and instance_pool_id is specified, then the driver will be allocated from that pool.

driverNodeTypeId String

The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id defined above.

enableElasticDisk Boolean

If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have autotermination_minutes and autoscale attributes set. More documentation available at cluster configuration page.

enableLocalDiskEncryption Boolean

Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.

gcpAttributes Property Map
idempotencyToken String

An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.

initScripts List<Property Map>
instancePoolId String

To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to TERMINATED, the instances it used are returned to the pool and reused by a different cluster.

isPinned Boolean

boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 70, so apply may fail if you have more than that.

libraries List<Property Map>
nodeTypeId String

Any supported databricks.getNodeType id. If instance_pool_id is specified, this field is not needed.

numWorkers Number

Number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

policyId String

Identifier of Cluster Policy to validate cluster and preset certain defaults. The primary use for cluster policies is to allow users to create policy-scoped clusters via UI rather than sharing configuration for API-created clusters. For example, when you specify policy_id of external metastore policy, you still have to fill in relevant keys for spark_conf.

runtimeEngine String

The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include: PHOTON, STANDARD.

singleUserName String

The optional user name of the user to assign to an interactive cluster. This field is required when using standard AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).

sparkConf Map<Any>

Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.

sparkEnvVars Map<Any>

Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.

sparkVersion String

Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.

sshPublicKeys List<String>

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.

state String

(string) State of the cluster.

url String
workloadType Property Map

Supporting Types

ClusterAutoscale

maxWorkers Integer
minWorkers Integer
maxWorkers number
minWorkers number
maxWorkers Number
minWorkers Number

ClusterAwsAttributes

ClusterAzureAttributes

ClusterClusterLogConf

ClusterClusterLogConfDbfs

ClusterClusterLogConfS3

Destination string
CannedAcl string
EnableEncryption bool
EncryptionType string
Endpoint string
KmsKey string
Region string
Destination string
CannedAcl string
EnableEncryption bool
EncryptionType string
Endpoint string
KmsKey string
Region string
destination String
cannedAcl String
enableEncryption Boolean
encryptionType String
endpoint String
kmsKey String
region String
destination string
cannedAcl string
enableEncryption boolean
encryptionType string
endpoint string
kmsKey string
region string
destination String
cannedAcl String
enableEncryption Boolean
encryptionType String
endpoint String
kmsKey String
region String

ClusterClusterMountInfo

ClusterClusterMountInfoNetworkFilesystemInfo

ClusterDockerImage

ClusterDockerImageBasicAuth

Password string
Username string
Password string
Username string
password String
username String
password string
username string
password String
username String

ClusterGcpAttributes

ClusterInitScript

ClusterInitScriptAbfss

ClusterInitScriptDbfs

ClusterInitScriptFile

ClusterInitScriptGcs

ClusterInitScriptS3

Destination string
CannedAcl string
EnableEncryption bool
EncryptionType string
Endpoint string
KmsKey string
Region string
Destination string
CannedAcl string
EnableEncryption bool
EncryptionType string
Endpoint string
KmsKey string
Region string
destination String
cannedAcl String
enableEncryption Boolean
encryptionType String
endpoint String
kmsKey String
region String
destination string
cannedAcl string
enableEncryption boolean
encryptionType string
endpoint string
kmsKey string
region string
destination String
cannedAcl String
enableEncryption Boolean
encryptionType String
endpoint String
kmsKey String
region String

ClusterLibrary

ClusterLibraryCran

Package string
Repo string
Package string
Repo string
package_ String
repo String
package string
repo string
package str
repo str
package String
repo String

ClusterLibraryMaven

Coordinates string
Exclusions List<string>
Repo string
Coordinates string
Exclusions []string
Repo string
coordinates String
exclusions List<String>
repo String
coordinates string
exclusions string[]
repo string
coordinates str
exclusions Sequence[str]
repo str
coordinates String
exclusions List<String>
repo String

ClusterLibraryPypi

Package string
Repo string
Package string
Repo string
package_ String
repo String
package string
repo string
package str
repo str
package String
repo String

ClusterWorkloadType

ClusterWorkloadTypeClients

Jobs bool
Notebooks bool
Jobs bool
Notebooks bool
jobs Boolean
notebooks Boolean
jobs boolean
notebooks boolean
jobs bool
notebooks bool
jobs Boolean
notebooks Boolean

Package Details

Repository
databricks pulumi/pulumi-databricks
License
Apache-2.0
Notes

This Pulumi package is based on the databricks Terraform Provider.