databricks.Cluster
Explore with Pulumi AI
Import
The resource cluster can be imported using cluster id. bash
$ pulumi import databricks:index/cluster:Cluster this <cluster-id>
Create Cluster Resource
new Cluster(name: string, args: ClusterArgs, opts?: CustomResourceOptions);
@overload
def Cluster(resource_name: str,
opts: Optional[ResourceOptions] = None,
apply_policy_default_values: Optional[bool] = None,
autoscale: Optional[ClusterAutoscaleArgs] = None,
autotermination_minutes: Optional[int] = None,
aws_attributes: Optional[ClusterAwsAttributesArgs] = None,
azure_attributes: Optional[ClusterAzureAttributesArgs] = None,
cluster_id: Optional[str] = None,
cluster_log_conf: Optional[ClusterClusterLogConfArgs] = None,
cluster_mount_infos: Optional[Sequence[ClusterClusterMountInfoArgs]] = None,
cluster_name: Optional[str] = None,
custom_tags: Optional[Mapping[str, Any]] = None,
data_security_mode: Optional[str] = None,
docker_image: Optional[ClusterDockerImageArgs] = None,
driver_instance_pool_id: Optional[str] = None,
driver_node_type_id: Optional[str] = None,
enable_elastic_disk: Optional[bool] = None,
enable_local_disk_encryption: Optional[bool] = None,
gcp_attributes: Optional[ClusterGcpAttributesArgs] = None,
idempotency_token: Optional[str] = None,
init_scripts: Optional[Sequence[ClusterInitScriptArgs]] = None,
instance_pool_id: Optional[str] = None,
is_pinned: Optional[bool] = None,
libraries: Optional[Sequence[ClusterLibraryArgs]] = None,
node_type_id: Optional[str] = None,
num_workers: Optional[int] = None,
policy_id: Optional[str] = None,
runtime_engine: Optional[str] = None,
single_user_name: Optional[str] = None,
spark_conf: Optional[Mapping[str, Any]] = None,
spark_env_vars: Optional[Mapping[str, Any]] = None,
spark_version: Optional[str] = None,
ssh_public_keys: Optional[Sequence[str]] = None,
workload_type: Optional[ClusterWorkloadTypeArgs] = None)
@overload
def Cluster(resource_name: str,
args: ClusterArgs,
opts: Optional[ResourceOptions] = None)
func NewCluster(ctx *Context, name string, args ClusterArgs, opts ...ResourceOption) (*Cluster, error)
public Cluster(string name, ClusterArgs args, CustomResourceOptions? opts = null)
public Cluster(String name, ClusterArgs args)
public Cluster(String name, ClusterArgs args, CustomResourceOptions options)
type: databricks:Cluster
properties: # The arguments to resource properties.
options: # Bag of options to control resource's behavior.
- name string
- The unique name of the resource.
- args ClusterArgs
- The arguments to resource properties.
- opts CustomResourceOptions
- Bag of options to control resource's behavior.
- resource_name str
- The unique name of the resource.
- args ClusterArgs
- The arguments to resource properties.
- opts ResourceOptions
- Bag of options to control resource's behavior.
- ctx Context
- Context object for the current deployment.
- name string
- The unique name of the resource.
- args ClusterArgs
- The arguments to resource properties.
- opts ResourceOption
- Bag of options to control resource's behavior.
- name string
- The unique name of the resource.
- args ClusterArgs
- The arguments to resource properties.
- opts CustomResourceOptions
- Bag of options to control resource's behavior.
- name String
- The unique name of the resource.
- args ClusterArgs
- The arguments to resource properties.
- options CustomResourceOptions
- Bag of options to control resource's behavior.
Cluster Resource Properties
To learn more about resource properties and how to use them, see Inputs and Outputs in the Architecture and Concepts docs.
Inputs
The Cluster resource accepts the following input properties:
- Spark
Version string Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.
- Apply
Policy boolDefault Values Whether to use policy default values for missing cluster attributes.
- Autoscale
Cluster
Autoscale - Autotermination
Minutes int Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to
60
. We highly recommend having this setting present for Interactive/BI clusters.- Aws
Attributes ClusterAws Attributes - Azure
Attributes ClusterAzure Attributes - Cluster
Id string - Cluster
Log ClusterConf Cluster Log Conf - Cluster
Mount List<ClusterInfos Cluster Mount Info> - Cluster
Name string Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.
- Dictionary<string, object>
Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to
default_tags
. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with anx_
when it is propagated.- Data
Security stringMode Select the security features of the cluster. Unity Catalog requires
SINGLE_USER
orUSER_ISOLATION
mode.LEGACY_PASSTHROUGH
for passthrough cluster andLEGACY_TABLE_ACL
for Table ACL cluster. If omitted, no security features are enabled. In the Databricks UI, this has been recently been renamed Access Mode andUSER_ISOLATION
has been renamed Shared, but use these terms here.- Docker
Image ClusterDocker Image - Driver
Instance stringPool Id similar to
instance_pool_id
, but for driver node. If omitted, andinstance_pool_id
is specified, then the driver will be allocated from that pool.- Driver
Node stringType Id The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as
node_type_id
defined above.- Enable
Elastic boolDisk If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have
autotermination_minutes
andautoscale
attributes set. More documentation available at cluster configuration page.- Enable
Local boolDisk Encryption Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.
- Gcp
Attributes ClusterGcp Attributes - Idempotency
Token string An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.
- Init
Scripts List<ClusterInit Script> - Instance
Pool stringId To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to
TERMINATED
, the instances it used are returned to the pool and reused by a different cluster.- Is
Pinned bool boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 100, so
apply
may fail if you have more than that (this number may change over time, so check Databricks documentation for actual number).The following example demonstrates how to create an autoscaling cluster with Delta Cache enabled:
import * as pulumi from "@pulumi/pulumi"; import * as databricks from "@pulumi/databricks";
const smallest = databricks.getNodeType({ localDisk: true, }); const latestLts = databricks.getSparkVersion({ longTermSupport: true, }); const sharedAutoscaling = new databricks.Cluster("sharedAutoscaling", { clusterName: "Shared Autoscaling", sparkVersion: latestLts.then(latestLts => latestLts.id), nodeTypeId: smallest.then(smallest => smallest.id), autoterminationMinutes: 20, autoscale: { minWorkers: 1, maxWorkers: 50, }, sparkConf: { "spark.databricks.io.cache.enabled": true, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", }, });
import pulumi import pulumi_databricks as databricks smallest = databricks.get_node_type(local_disk=True) latest_lts = databricks.get_spark_version(long_term_support=True) shared_autoscaling = databricks.Cluster("sharedAutoscaling", cluster_name="Shared Autoscaling", spark_version=latest_lts.id, node_type_id=smallest.id, autotermination_minutes=20, autoscale=databricks.ClusterAutoscaleArgs( min_workers=1, max_workers=50, ), spark_conf={ "spark.databricks.io.cache.enabled": True, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", })
using System.Collections.Generic; using System.Linq; using Pulumi; using Databricks = Pulumi.Databricks; return await Deployment.RunAsync(() => { var smallest = Databricks.GetNodeType.Invoke(new() { LocalDisk = true, }); var latestLts = Databricks.GetSparkVersion.Invoke(new() { LongTermSupport = true, }); var sharedAutoscaling = new Databricks.Cluster("sharedAutoscaling", new() { ClusterName = "Shared Autoscaling", SparkVersion = latestLts.Apply(getSparkVersionResult => getSparkVersionResult.Id), NodeTypeId = smallest.Apply(getNodeTypeResult => getNodeTypeResult.Id), AutoterminationMinutes = 20, Autoscale = new Databricks.Inputs.ClusterAutoscaleArgs { MinWorkers = 1, MaxWorkers = 50, }, SparkConf = { { "spark.databricks.io.cache.enabled", true }, { "spark.databricks.io.cache.maxDiskUsage", "50g" }, { "spark.databricks.io.cache.maxMetaDataCache", "1g" }, }, }); });
package main import ( "github.com/pulumi/pulumi-databricks/sdk/go/databricks" "github.com/pulumi/pulumi/sdk/v3/go/pulumi" ) func main() { pulumi.Run(func(ctx *pulumi.Context) error { smallest, err := databricks.GetNodeType(ctx, &databricks.GetNodeTypeArgs{ LocalDisk: pulumi.BoolRef(true), }, nil) if err != nil { return err } latestLts, err := databricks.GetSparkVersion(ctx, &databricks.GetSparkVersionArgs{ LongTermSupport: pulumi.BoolRef(true), }, nil) if err != nil { return err } _, err = databricks.NewCluster(ctx, "sharedAutoscaling", &databricks.ClusterArgs{ ClusterName: pulumi.String("Shared Autoscaling"), SparkVersion: *pulumi.String(latestLts.Id), NodeTypeId: *pulumi.String(smallest.Id), AutoterminationMinutes: pulumi.Int(20), Autoscale: &databricks.ClusterAutoscaleArgs{ MinWorkers: pulumi.Int(1), MaxWorkers: pulumi.Int(50), }, SparkConf: pulumi.Map{ "spark.databricks.io.cache.enabled": pulumi.Any(true), "spark.databricks.io.cache.maxDiskUsage": pulumi.Any("50g"), "spark.databricks.io.cache.maxMetaDataCache": pulumi.Any("1g"), }, }) if err != nil { return err } return nil }) }
package generated_program; import com.pulumi.Context; import com.pulumi.Pulumi; import com.pulumi.core.Output; import com.pulumi.databricks.DatabricksFunctions; import com.pulumi.databricks.inputs.GetNodeTypeArgs; import com.pulumi.databricks.inputs.GetSparkVersionArgs; import com.pulumi.databricks.Cluster; import com.pulumi.databricks.ClusterArgs; import com.pulumi.databricks.inputs.ClusterAutoscaleArgs; import java.util.List; import java.util.ArrayList; import java.util.Map; import java.io.File; import java.nio.file.Files; import java.nio.file.Paths; public class App { public static void main(String[] args) { Pulumi.run(App::stack); } public static void stack(Context ctx) { final var smallest = DatabricksFunctions.getNodeType(GetNodeTypeArgs.builder() .localDisk(true) .build()); final var latestLts = DatabricksFunctions.getSparkVersion(GetSparkVersionArgs.builder() .longTermSupport(true) .build()); var sharedAutoscaling = new Cluster("sharedAutoscaling", ClusterArgs.builder() .clusterName("Shared Autoscaling") .sparkVersion(latestLts.applyValue(getSparkVersionResult -> getSparkVersionResult.id())) .nodeTypeId(smallest.applyValue(getNodeTypeResult -> getNodeTypeResult.id())) .autoterminationMinutes(20) .autoscale(ClusterAutoscaleArgs.builder() .minWorkers(1) .maxWorkers(50) .build()) .sparkConf(Map.ofEntries( Map.entry("spark.databricks.io.cache.enabled", true), Map.entry("spark.databricks.io.cache.maxDiskUsage", "50g"), Map.entry("spark.databricks.io.cache.maxMetaDataCache", "1g") )) .build()); } }
resources: sharedAutoscaling: type: databricks:Cluster properties: clusterName: Shared Autoscaling sparkVersion: ${latestLts.id} nodeTypeId: ${smallest.id} autoterminationMinutes: 20 autoscale: minWorkers: 1 maxWorkers: 50 sparkConf: spark.databricks.io.cache.enabled: true spark.databricks.io.cache.maxDiskUsage: 50g spark.databricks.io.cache.maxMetaDataCache: 1g variables: smallest: fn::invoke: Function: databricks:getNodeType Arguments: localDisk: true latestLts: fn::invoke: Function: databricks:getSparkVersion Arguments: longTermSupport: true
- Libraries
List<Cluster
Library> - Node
Type stringId Any supported databricks.getNodeType id. If
instance_pool_id
is specified, this field is not needed.- Num
Workers int Number of worker nodes that this cluster should have. A cluster has one Spark driver and
num_workers
executors for a total ofnum_workers
+ 1 Spark nodes.- Policy
Id string - Runtime
Engine string The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include:
PHOTON
,STANDARD
.- Single
User stringName The optional user name of the user to assign to an interactive cluster. This field is required when using
data_security_mode
set toSINGLE_USER
or AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).- Spark
Conf Dictionary<string, object> Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.
- Spark
Env Dictionary<string, object>Vars Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.
- Ssh
Public List<string>Keys SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.
- Workload
Type ClusterWorkload Type
- Spark
Version string Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.
- Apply
Policy boolDefault Values Whether to use policy default values for missing cluster attributes.
- Autoscale
Cluster
Autoscale Args - Autotermination
Minutes int Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to
60
. We highly recommend having this setting present for Interactive/BI clusters.- Aws
Attributes ClusterAws Attributes Args - Azure
Attributes ClusterAzure Attributes Args - Cluster
Id string - Cluster
Log ClusterConf Cluster Log Conf Args - Cluster
Mount []ClusterInfos Cluster Mount Info Args - Cluster
Name string Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.
- map[string]interface{}
Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to
default_tags
. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with anx_
when it is propagated.- Data
Security stringMode Select the security features of the cluster. Unity Catalog requires
SINGLE_USER
orUSER_ISOLATION
mode.LEGACY_PASSTHROUGH
for passthrough cluster andLEGACY_TABLE_ACL
for Table ACL cluster. If omitted, no security features are enabled. In the Databricks UI, this has been recently been renamed Access Mode andUSER_ISOLATION
has been renamed Shared, but use these terms here.- Docker
Image ClusterDocker Image Args - Driver
Instance stringPool Id similar to
instance_pool_id
, but for driver node. If omitted, andinstance_pool_id
is specified, then the driver will be allocated from that pool.- Driver
Node stringType Id The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as
node_type_id
defined above.- Enable
Elastic boolDisk If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have
autotermination_minutes
andautoscale
attributes set. More documentation available at cluster configuration page.- Enable
Local boolDisk Encryption Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.
- Gcp
Attributes ClusterGcp Attributes Args - Idempotency
Token string An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.
- Init
Scripts []ClusterInit Script Args - Instance
Pool stringId To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to
TERMINATED
, the instances it used are returned to the pool and reused by a different cluster.- Is
Pinned bool boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 100, so
apply
may fail if you have more than that (this number may change over time, so check Databricks documentation for actual number).The following example demonstrates how to create an autoscaling cluster with Delta Cache enabled:
import * as pulumi from "@pulumi/pulumi"; import * as databricks from "@pulumi/databricks";
const smallest = databricks.getNodeType({ localDisk: true, }); const latestLts = databricks.getSparkVersion({ longTermSupport: true, }); const sharedAutoscaling = new databricks.Cluster("sharedAutoscaling", { clusterName: "Shared Autoscaling", sparkVersion: latestLts.then(latestLts => latestLts.id), nodeTypeId: smallest.then(smallest => smallest.id), autoterminationMinutes: 20, autoscale: { minWorkers: 1, maxWorkers: 50, }, sparkConf: { "spark.databricks.io.cache.enabled": true, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", }, });
import pulumi import pulumi_databricks as databricks smallest = databricks.get_node_type(local_disk=True) latest_lts = databricks.get_spark_version(long_term_support=True) shared_autoscaling = databricks.Cluster("sharedAutoscaling", cluster_name="Shared Autoscaling", spark_version=latest_lts.id, node_type_id=smallest.id, autotermination_minutes=20, autoscale=databricks.ClusterAutoscaleArgs( min_workers=1, max_workers=50, ), spark_conf={ "spark.databricks.io.cache.enabled": True, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", })
using System.Collections.Generic; using System.Linq; using Pulumi; using Databricks = Pulumi.Databricks; return await Deployment.RunAsync(() => { var smallest = Databricks.GetNodeType.Invoke(new() { LocalDisk = true, }); var latestLts = Databricks.GetSparkVersion.Invoke(new() { LongTermSupport = true, }); var sharedAutoscaling = new Databricks.Cluster("sharedAutoscaling", new() { ClusterName = "Shared Autoscaling", SparkVersion = latestLts.Apply(getSparkVersionResult => getSparkVersionResult.Id), NodeTypeId = smallest.Apply(getNodeTypeResult => getNodeTypeResult.Id), AutoterminationMinutes = 20, Autoscale = new Databricks.Inputs.ClusterAutoscaleArgs { MinWorkers = 1, MaxWorkers = 50, }, SparkConf = { { "spark.databricks.io.cache.enabled", true }, { "spark.databricks.io.cache.maxDiskUsage", "50g" }, { "spark.databricks.io.cache.maxMetaDataCache", "1g" }, }, }); });
package main import ( "github.com/pulumi/pulumi-databricks/sdk/go/databricks" "github.com/pulumi/pulumi/sdk/v3/go/pulumi" ) func main() { pulumi.Run(func(ctx *pulumi.Context) error { smallest, err := databricks.GetNodeType(ctx, &databricks.GetNodeTypeArgs{ LocalDisk: pulumi.BoolRef(true), }, nil) if err != nil { return err } latestLts, err := databricks.GetSparkVersion(ctx, &databricks.GetSparkVersionArgs{ LongTermSupport: pulumi.BoolRef(true), }, nil) if err != nil { return err } _, err = databricks.NewCluster(ctx, "sharedAutoscaling", &databricks.ClusterArgs{ ClusterName: pulumi.String("Shared Autoscaling"), SparkVersion: *pulumi.String(latestLts.Id), NodeTypeId: *pulumi.String(smallest.Id), AutoterminationMinutes: pulumi.Int(20), Autoscale: &databricks.ClusterAutoscaleArgs{ MinWorkers: pulumi.Int(1), MaxWorkers: pulumi.Int(50), }, SparkConf: pulumi.Map{ "spark.databricks.io.cache.enabled": pulumi.Any(true), "spark.databricks.io.cache.maxDiskUsage": pulumi.Any("50g"), "spark.databricks.io.cache.maxMetaDataCache": pulumi.Any("1g"), }, }) if err != nil { return err } return nil }) }
package generated_program; import com.pulumi.Context; import com.pulumi.Pulumi; import com.pulumi.core.Output; import com.pulumi.databricks.DatabricksFunctions; import com.pulumi.databricks.inputs.GetNodeTypeArgs; import com.pulumi.databricks.inputs.GetSparkVersionArgs; import com.pulumi.databricks.Cluster; import com.pulumi.databricks.ClusterArgs; import com.pulumi.databricks.inputs.ClusterAutoscaleArgs; import java.util.List; import java.util.ArrayList; import java.util.Map; import java.io.File; import java.nio.file.Files; import java.nio.file.Paths; public class App { public static void main(String[] args) { Pulumi.run(App::stack); } public static void stack(Context ctx) { final var smallest = DatabricksFunctions.getNodeType(GetNodeTypeArgs.builder() .localDisk(true) .build()); final var latestLts = DatabricksFunctions.getSparkVersion(GetSparkVersionArgs.builder() .longTermSupport(true) .build()); var sharedAutoscaling = new Cluster("sharedAutoscaling", ClusterArgs.builder() .clusterName("Shared Autoscaling") .sparkVersion(latestLts.applyValue(getSparkVersionResult -> getSparkVersionResult.id())) .nodeTypeId(smallest.applyValue(getNodeTypeResult -> getNodeTypeResult.id())) .autoterminationMinutes(20) .autoscale(ClusterAutoscaleArgs.builder() .minWorkers(1) .maxWorkers(50) .build()) .sparkConf(Map.ofEntries( Map.entry("spark.databricks.io.cache.enabled", true), Map.entry("spark.databricks.io.cache.maxDiskUsage", "50g"), Map.entry("spark.databricks.io.cache.maxMetaDataCache", "1g") )) .build()); } }
resources: sharedAutoscaling: type: databricks:Cluster properties: clusterName: Shared Autoscaling sparkVersion: ${latestLts.id} nodeTypeId: ${smallest.id} autoterminationMinutes: 20 autoscale: minWorkers: 1 maxWorkers: 50 sparkConf: spark.databricks.io.cache.enabled: true spark.databricks.io.cache.maxDiskUsage: 50g spark.databricks.io.cache.maxMetaDataCache: 1g variables: smallest: fn::invoke: Function: databricks:getNodeType Arguments: localDisk: true latestLts: fn::invoke: Function: databricks:getSparkVersion Arguments: longTermSupport: true
- Libraries
[]Cluster
Library Args - Node
Type stringId Any supported databricks.getNodeType id. If
instance_pool_id
is specified, this field is not needed.- Num
Workers int Number of worker nodes that this cluster should have. A cluster has one Spark driver and
num_workers
executors for a total ofnum_workers
+ 1 Spark nodes.- Policy
Id string - Runtime
Engine string The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include:
PHOTON
,STANDARD
.- Single
User stringName The optional user name of the user to assign to an interactive cluster. This field is required when using
data_security_mode
set toSINGLE_USER
or AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).- Spark
Conf map[string]interface{} Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.
- Spark
Env map[string]interface{}Vars Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.
- Ssh
Public []stringKeys SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.
- Workload
Type ClusterWorkload Type Args
- spark
Version String Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.
- apply
Policy BooleanDefault Values Whether to use policy default values for missing cluster attributes.
- autoscale
Cluster
Autoscale - autotermination
Minutes Integer Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to
60
. We highly recommend having this setting present for Interactive/BI clusters.- aws
Attributes ClusterAws Attributes - azure
Attributes ClusterAzure Attributes - cluster
Id String - cluster
Log ClusterConf Cluster Log Conf - cluster
Mount List<ClusterInfos Cluster Mount Info> - cluster
Name String Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.
- Map<String,Object>
Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to
default_tags
. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with anx_
when it is propagated.- data
Security StringMode Select the security features of the cluster. Unity Catalog requires
SINGLE_USER
orUSER_ISOLATION
mode.LEGACY_PASSTHROUGH
for passthrough cluster andLEGACY_TABLE_ACL
for Table ACL cluster. If omitted, no security features are enabled. In the Databricks UI, this has been recently been renamed Access Mode andUSER_ISOLATION
has been renamed Shared, but use these terms here.- docker
Image ClusterDocker Image - driver
Instance StringPool Id similar to
instance_pool_id
, but for driver node. If omitted, andinstance_pool_id
is specified, then the driver will be allocated from that pool.- driver
Node StringType Id The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as
node_type_id
defined above.- enable
Elastic BooleanDisk If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have
autotermination_minutes
andautoscale
attributes set. More documentation available at cluster configuration page.- enable
Local BooleanDisk Encryption Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.
- gcp
Attributes ClusterGcp Attributes - idempotency
Token String An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.
- init
Scripts List<ClusterInit Script> - instance
Pool StringId To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to
TERMINATED
, the instances it used are returned to the pool and reused by a different cluster.- is
Pinned Boolean boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 100, so
apply
may fail if you have more than that (this number may change over time, so check Databricks documentation for actual number).The following example demonstrates how to create an autoscaling cluster with Delta Cache enabled:
import * as pulumi from "@pulumi/pulumi"; import * as databricks from "@pulumi/databricks";
const smallest = databricks.getNodeType({ localDisk: true, }); const latestLts = databricks.getSparkVersion({ longTermSupport: true, }); const sharedAutoscaling = new databricks.Cluster("sharedAutoscaling", { clusterName: "Shared Autoscaling", sparkVersion: latestLts.then(latestLts => latestLts.id), nodeTypeId: smallest.then(smallest => smallest.id), autoterminationMinutes: 20, autoscale: { minWorkers: 1, maxWorkers: 50, }, sparkConf: { "spark.databricks.io.cache.enabled": true, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", }, });
import pulumi import pulumi_databricks as databricks smallest = databricks.get_node_type(local_disk=True) latest_lts = databricks.get_spark_version(long_term_support=True) shared_autoscaling = databricks.Cluster("sharedAutoscaling", cluster_name="Shared Autoscaling", spark_version=latest_lts.id, node_type_id=smallest.id, autotermination_minutes=20, autoscale=databricks.ClusterAutoscaleArgs( min_workers=1, max_workers=50, ), spark_conf={ "spark.databricks.io.cache.enabled": True, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", })
using System.Collections.Generic; using System.Linq; using Pulumi; using Databricks = Pulumi.Databricks; return await Deployment.RunAsync(() => { var smallest = Databricks.GetNodeType.Invoke(new() { LocalDisk = true, }); var latestLts = Databricks.GetSparkVersion.Invoke(new() { LongTermSupport = true, }); var sharedAutoscaling = new Databricks.Cluster("sharedAutoscaling", new() { ClusterName = "Shared Autoscaling", SparkVersion = latestLts.Apply(getSparkVersionResult => getSparkVersionResult.Id), NodeTypeId = smallest.Apply(getNodeTypeResult => getNodeTypeResult.Id), AutoterminationMinutes = 20, Autoscale = new Databricks.Inputs.ClusterAutoscaleArgs { MinWorkers = 1, MaxWorkers = 50, }, SparkConf = { { "spark.databricks.io.cache.enabled", true }, { "spark.databricks.io.cache.maxDiskUsage", "50g" }, { "spark.databricks.io.cache.maxMetaDataCache", "1g" }, }, }); });
package main import ( "github.com/pulumi/pulumi-databricks/sdk/go/databricks" "github.com/pulumi/pulumi/sdk/v3/go/pulumi" ) func main() { pulumi.Run(func(ctx *pulumi.Context) error { smallest, err := databricks.GetNodeType(ctx, &databricks.GetNodeTypeArgs{ LocalDisk: pulumi.BoolRef(true), }, nil) if err != nil { return err } latestLts, err := databricks.GetSparkVersion(ctx, &databricks.GetSparkVersionArgs{ LongTermSupport: pulumi.BoolRef(true), }, nil) if err != nil { return err } _, err = databricks.NewCluster(ctx, "sharedAutoscaling", &databricks.ClusterArgs{ ClusterName: pulumi.String("Shared Autoscaling"), SparkVersion: *pulumi.String(latestLts.Id), NodeTypeId: *pulumi.String(smallest.Id), AutoterminationMinutes: pulumi.Int(20), Autoscale: &databricks.ClusterAutoscaleArgs{ MinWorkers: pulumi.Int(1), MaxWorkers: pulumi.Int(50), }, SparkConf: pulumi.Map{ "spark.databricks.io.cache.enabled": pulumi.Any(true), "spark.databricks.io.cache.maxDiskUsage": pulumi.Any("50g"), "spark.databricks.io.cache.maxMetaDataCache": pulumi.Any("1g"), }, }) if err != nil { return err } return nil }) }
package generated_program; import com.pulumi.Context; import com.pulumi.Pulumi; import com.pulumi.core.Output; import com.pulumi.databricks.DatabricksFunctions; import com.pulumi.databricks.inputs.GetNodeTypeArgs; import com.pulumi.databricks.inputs.GetSparkVersionArgs; import com.pulumi.databricks.Cluster; import com.pulumi.databricks.ClusterArgs; import com.pulumi.databricks.inputs.ClusterAutoscaleArgs; import java.util.List; import java.util.ArrayList; import java.util.Map; import java.io.File; import java.nio.file.Files; import java.nio.file.Paths; public class App { public static void main(String[] args) { Pulumi.run(App::stack); } public static void stack(Context ctx) { final var smallest = DatabricksFunctions.getNodeType(GetNodeTypeArgs.builder() .localDisk(true) .build()); final var latestLts = DatabricksFunctions.getSparkVersion(GetSparkVersionArgs.builder() .longTermSupport(true) .build()); var sharedAutoscaling = new Cluster("sharedAutoscaling", ClusterArgs.builder() .clusterName("Shared Autoscaling") .sparkVersion(latestLts.applyValue(getSparkVersionResult -> getSparkVersionResult.id())) .nodeTypeId(smallest.applyValue(getNodeTypeResult -> getNodeTypeResult.id())) .autoterminationMinutes(20) .autoscale(ClusterAutoscaleArgs.builder() .minWorkers(1) .maxWorkers(50) .build()) .sparkConf(Map.ofEntries( Map.entry("spark.databricks.io.cache.enabled", true), Map.entry("spark.databricks.io.cache.maxDiskUsage", "50g"), Map.entry("spark.databricks.io.cache.maxMetaDataCache", "1g") )) .build()); } }
resources: sharedAutoscaling: type: databricks:Cluster properties: clusterName: Shared Autoscaling sparkVersion: ${latestLts.id} nodeTypeId: ${smallest.id} autoterminationMinutes: 20 autoscale: minWorkers: 1 maxWorkers: 50 sparkConf: spark.databricks.io.cache.enabled: true spark.databricks.io.cache.maxDiskUsage: 50g spark.databricks.io.cache.maxMetaDataCache: 1g variables: smallest: fn::invoke: Function: databricks:getNodeType Arguments: localDisk: true latestLts: fn::invoke: Function: databricks:getSparkVersion Arguments: longTermSupport: true
- libraries
List<Cluster
Library> - node
Type StringId Any supported databricks.getNodeType id. If
instance_pool_id
is specified, this field is not needed.- num
Workers Integer Number of worker nodes that this cluster should have. A cluster has one Spark driver and
num_workers
executors for a total ofnum_workers
+ 1 Spark nodes.- policy
Id String - runtime
Engine String The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include:
PHOTON
,STANDARD
.- single
User StringName The optional user name of the user to assign to an interactive cluster. This field is required when using
data_security_mode
set toSINGLE_USER
or AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).- spark
Conf Map<String,Object> Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.
- spark
Env Map<String,Object>Vars Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.
- ssh
Public List<String>Keys SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.
- workload
Type ClusterWorkload Type
- spark
Version string Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.
- apply
Policy booleanDefault Values Whether to use policy default values for missing cluster attributes.
- autoscale
Cluster
Autoscale - autotermination
Minutes number Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to
60
. We highly recommend having this setting present for Interactive/BI clusters.- aws
Attributes ClusterAws Attributes - azure
Attributes ClusterAzure Attributes - cluster
Id string - cluster
Log ClusterConf Cluster Log Conf - cluster
Mount ClusterInfos Cluster Mount Info[] - cluster
Name string Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.
- {[key: string]: any}
Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to
default_tags
. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with anx_
when it is propagated.- data
Security stringMode Select the security features of the cluster. Unity Catalog requires
SINGLE_USER
orUSER_ISOLATION
mode.LEGACY_PASSTHROUGH
for passthrough cluster andLEGACY_TABLE_ACL
for Table ACL cluster. If omitted, no security features are enabled. In the Databricks UI, this has been recently been renamed Access Mode andUSER_ISOLATION
has been renamed Shared, but use these terms here.- docker
Image ClusterDocker Image - driver
Instance stringPool Id similar to
instance_pool_id
, but for driver node. If omitted, andinstance_pool_id
is specified, then the driver will be allocated from that pool.- driver
Node stringType Id The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as
node_type_id
defined above.- enable
Elastic booleanDisk If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have
autotermination_minutes
andautoscale
attributes set. More documentation available at cluster configuration page.- enable
Local booleanDisk Encryption Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.
- gcp
Attributes ClusterGcp Attributes - idempotency
Token string An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.
- init
Scripts ClusterInit Script[] - instance
Pool stringId To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to
TERMINATED
, the instances it used are returned to the pool and reused by a different cluster.- is
Pinned boolean boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 100, so
apply
may fail if you have more than that (this number may change over time, so check Databricks documentation for actual number).The following example demonstrates how to create an autoscaling cluster with Delta Cache enabled:
import * as pulumi from "@pulumi/pulumi"; import * as databricks from "@pulumi/databricks";
const smallest = databricks.getNodeType({ localDisk: true, }); const latestLts = databricks.getSparkVersion({ longTermSupport: true, }); const sharedAutoscaling = new databricks.Cluster("sharedAutoscaling", { clusterName: "Shared Autoscaling", sparkVersion: latestLts.then(latestLts => latestLts.id), nodeTypeId: smallest.then(smallest => smallest.id), autoterminationMinutes: 20, autoscale: { minWorkers: 1, maxWorkers: 50, }, sparkConf: { "spark.databricks.io.cache.enabled": true, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", }, });
import pulumi import pulumi_databricks as databricks smallest = databricks.get_node_type(local_disk=True) latest_lts = databricks.get_spark_version(long_term_support=True) shared_autoscaling = databricks.Cluster("sharedAutoscaling", cluster_name="Shared Autoscaling", spark_version=latest_lts.id, node_type_id=smallest.id, autotermination_minutes=20, autoscale=databricks.ClusterAutoscaleArgs( min_workers=1, max_workers=50, ), spark_conf={ "spark.databricks.io.cache.enabled": True, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", })
using System.Collections.Generic; using System.Linq; using Pulumi; using Databricks = Pulumi.Databricks; return await Deployment.RunAsync(() => { var smallest = Databricks.GetNodeType.Invoke(new() { LocalDisk = true, }); var latestLts = Databricks.GetSparkVersion.Invoke(new() { LongTermSupport = true, }); var sharedAutoscaling = new Databricks.Cluster("sharedAutoscaling", new() { ClusterName = "Shared Autoscaling", SparkVersion = latestLts.Apply(getSparkVersionResult => getSparkVersionResult.Id), NodeTypeId = smallest.Apply(getNodeTypeResult => getNodeTypeResult.Id), AutoterminationMinutes = 20, Autoscale = new Databricks.Inputs.ClusterAutoscaleArgs { MinWorkers = 1, MaxWorkers = 50, }, SparkConf = { { "spark.databricks.io.cache.enabled", true }, { "spark.databricks.io.cache.maxDiskUsage", "50g" }, { "spark.databricks.io.cache.maxMetaDataCache", "1g" }, }, }); });
package main import ( "github.com/pulumi/pulumi-databricks/sdk/go/databricks" "github.com/pulumi/pulumi/sdk/v3/go/pulumi" ) func main() { pulumi.Run(func(ctx *pulumi.Context) error { smallest, err := databricks.GetNodeType(ctx, &databricks.GetNodeTypeArgs{ LocalDisk: pulumi.BoolRef(true), }, nil) if err != nil { return err } latestLts, err := databricks.GetSparkVersion(ctx, &databricks.GetSparkVersionArgs{ LongTermSupport: pulumi.BoolRef(true), }, nil) if err != nil { return err } _, err = databricks.NewCluster(ctx, "sharedAutoscaling", &databricks.ClusterArgs{ ClusterName: pulumi.String("Shared Autoscaling"), SparkVersion: *pulumi.String(latestLts.Id), NodeTypeId: *pulumi.String(smallest.Id), AutoterminationMinutes: pulumi.Int(20), Autoscale: &databricks.ClusterAutoscaleArgs{ MinWorkers: pulumi.Int(1), MaxWorkers: pulumi.Int(50), }, SparkConf: pulumi.Map{ "spark.databricks.io.cache.enabled": pulumi.Any(true), "spark.databricks.io.cache.maxDiskUsage": pulumi.Any("50g"), "spark.databricks.io.cache.maxMetaDataCache": pulumi.Any("1g"), }, }) if err != nil { return err } return nil }) }
package generated_program; import com.pulumi.Context; import com.pulumi.Pulumi; import com.pulumi.core.Output; import com.pulumi.databricks.DatabricksFunctions; import com.pulumi.databricks.inputs.GetNodeTypeArgs; import com.pulumi.databricks.inputs.GetSparkVersionArgs; import com.pulumi.databricks.Cluster; import com.pulumi.databricks.ClusterArgs; import com.pulumi.databricks.inputs.ClusterAutoscaleArgs; import java.util.List; import java.util.ArrayList; import java.util.Map; import java.io.File; import java.nio.file.Files; import java.nio.file.Paths; public class App { public static void main(String[] args) { Pulumi.run(App::stack); } public static void stack(Context ctx) { final var smallest = DatabricksFunctions.getNodeType(GetNodeTypeArgs.builder() .localDisk(true) .build()); final var latestLts = DatabricksFunctions.getSparkVersion(GetSparkVersionArgs.builder() .longTermSupport(true) .build()); var sharedAutoscaling = new Cluster("sharedAutoscaling", ClusterArgs.builder() .clusterName("Shared Autoscaling") .sparkVersion(latestLts.applyValue(getSparkVersionResult -> getSparkVersionResult.id())) .nodeTypeId(smallest.applyValue(getNodeTypeResult -> getNodeTypeResult.id())) .autoterminationMinutes(20) .autoscale(ClusterAutoscaleArgs.builder() .minWorkers(1) .maxWorkers(50) .build()) .sparkConf(Map.ofEntries( Map.entry("spark.databricks.io.cache.enabled", true), Map.entry("spark.databricks.io.cache.maxDiskUsage", "50g"), Map.entry("spark.databricks.io.cache.maxMetaDataCache", "1g") )) .build()); } }
resources: sharedAutoscaling: type: databricks:Cluster properties: clusterName: Shared Autoscaling sparkVersion: ${latestLts.id} nodeTypeId: ${smallest.id} autoterminationMinutes: 20 autoscale: minWorkers: 1 maxWorkers: 50 sparkConf: spark.databricks.io.cache.enabled: true spark.databricks.io.cache.maxDiskUsage: 50g spark.databricks.io.cache.maxMetaDataCache: 1g variables: smallest: fn::invoke: Function: databricks:getNodeType Arguments: localDisk: true latestLts: fn::invoke: Function: databricks:getSparkVersion Arguments: longTermSupport: true
- libraries
Cluster
Library[] - node
Type stringId Any supported databricks.getNodeType id. If
instance_pool_id
is specified, this field is not needed.- num
Workers number Number of worker nodes that this cluster should have. A cluster has one Spark driver and
num_workers
executors for a total ofnum_workers
+ 1 Spark nodes.- policy
Id string - runtime
Engine string The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include:
PHOTON
,STANDARD
.- single
User stringName The optional user name of the user to assign to an interactive cluster. This field is required when using
data_security_mode
set toSINGLE_USER
or AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).- spark
Conf {[key: string]: any} Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.
- spark
Env {[key: string]: any}Vars Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.
- ssh
Public string[]Keys SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.
- workload
Type ClusterWorkload Type
- spark_
version str Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.
- apply_
policy_ booldefault_ values Whether to use policy default values for missing cluster attributes.
- autoscale
Cluster
Autoscale Args - autotermination_
minutes int Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to
60
. We highly recommend having this setting present for Interactive/BI clusters.- aws_
attributes ClusterAws Attributes Args - azure_
attributes ClusterAzure Attributes Args - cluster_
id str - cluster_
log_ Clusterconf Cluster Log Conf Args - cluster_
mount_ Sequence[Clusterinfos Cluster Mount Info Args] - cluster_
name str Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.
- Mapping[str, Any]
Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to
default_tags
. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with anx_
when it is propagated.- data_
security_ strmode Select the security features of the cluster. Unity Catalog requires
SINGLE_USER
orUSER_ISOLATION
mode.LEGACY_PASSTHROUGH
for passthrough cluster andLEGACY_TABLE_ACL
for Table ACL cluster. If omitted, no security features are enabled. In the Databricks UI, this has been recently been renamed Access Mode andUSER_ISOLATION
has been renamed Shared, but use these terms here.- docker_
image ClusterDocker Image Args - driver_
instance_ strpool_ id similar to
instance_pool_id
, but for driver node. If omitted, andinstance_pool_id
is specified, then the driver will be allocated from that pool.- driver_
node_ strtype_ id The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as
node_type_id
defined above.- enable_
elastic_ booldisk If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have
autotermination_minutes
andautoscale
attributes set. More documentation available at cluster configuration page.- enable_
local_ booldisk_ encryption Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.
- gcp_
attributes ClusterGcp Attributes Args - idempotency_
token str An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.
- init_
scripts Sequence[ClusterInit Script Args] - instance_
pool_ strid To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to
TERMINATED
, the instances it used are returned to the pool and reused by a different cluster.- is_
pinned bool boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 100, so
apply
may fail if you have more than that (this number may change over time, so check Databricks documentation for actual number).The following example demonstrates how to create an autoscaling cluster with Delta Cache enabled:
import * as pulumi from "@pulumi/pulumi"; import * as databricks from "@pulumi/databricks";
const smallest = databricks.getNodeType({ localDisk: true, }); const latestLts = databricks.getSparkVersion({ longTermSupport: true, }); const sharedAutoscaling = new databricks.Cluster("sharedAutoscaling", { clusterName: "Shared Autoscaling", sparkVersion: latestLts.then(latestLts => latestLts.id), nodeTypeId: smallest.then(smallest => smallest.id), autoterminationMinutes: 20, autoscale: { minWorkers: 1, maxWorkers: 50, }, sparkConf: { "spark.databricks.io.cache.enabled": true, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", }, });
import pulumi import pulumi_databricks as databricks smallest = databricks.get_node_type(local_disk=True) latest_lts = databricks.get_spark_version(long_term_support=True) shared_autoscaling = databricks.Cluster("sharedAutoscaling", cluster_name="Shared Autoscaling", spark_version=latest_lts.id, node_type_id=smallest.id, autotermination_minutes=20, autoscale=databricks.ClusterAutoscaleArgs( min_workers=1, max_workers=50, ), spark_conf={ "spark.databricks.io.cache.enabled": True, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", })
using System.Collections.Generic; using System.Linq; using Pulumi; using Databricks = Pulumi.Databricks; return await Deployment.RunAsync(() => { var smallest = Databricks.GetNodeType.Invoke(new() { LocalDisk = true, }); var latestLts = Databricks.GetSparkVersion.Invoke(new() { LongTermSupport = true, }); var sharedAutoscaling = new Databricks.Cluster("sharedAutoscaling", new() { ClusterName = "Shared Autoscaling", SparkVersion = latestLts.Apply(getSparkVersionResult => getSparkVersionResult.Id), NodeTypeId = smallest.Apply(getNodeTypeResult => getNodeTypeResult.Id), AutoterminationMinutes = 20, Autoscale = new Databricks.Inputs.ClusterAutoscaleArgs { MinWorkers = 1, MaxWorkers = 50, }, SparkConf = { { "spark.databricks.io.cache.enabled", true }, { "spark.databricks.io.cache.maxDiskUsage", "50g" }, { "spark.databricks.io.cache.maxMetaDataCache", "1g" }, }, }); });
package main import ( "github.com/pulumi/pulumi-databricks/sdk/go/databricks" "github.com/pulumi/pulumi/sdk/v3/go/pulumi" ) func main() { pulumi.Run(func(ctx *pulumi.Context) error { smallest, err := databricks.GetNodeType(ctx, &databricks.GetNodeTypeArgs{ LocalDisk: pulumi.BoolRef(true), }, nil) if err != nil { return err } latestLts, err := databricks.GetSparkVersion(ctx, &databricks.GetSparkVersionArgs{ LongTermSupport: pulumi.BoolRef(true), }, nil) if err != nil { return err } _, err = databricks.NewCluster(ctx, "sharedAutoscaling", &databricks.ClusterArgs{ ClusterName: pulumi.String("Shared Autoscaling"), SparkVersion: *pulumi.String(latestLts.Id), NodeTypeId: *pulumi.String(smallest.Id), AutoterminationMinutes: pulumi.Int(20), Autoscale: &databricks.ClusterAutoscaleArgs{ MinWorkers: pulumi.Int(1), MaxWorkers: pulumi.Int(50), }, SparkConf: pulumi.Map{ "spark.databricks.io.cache.enabled": pulumi.Any(true), "spark.databricks.io.cache.maxDiskUsage": pulumi.Any("50g"), "spark.databricks.io.cache.maxMetaDataCache": pulumi.Any("1g"), }, }) if err != nil { return err } return nil }) }
package generated_program; import com.pulumi.Context; import com.pulumi.Pulumi; import com.pulumi.core.Output; import com.pulumi.databricks.DatabricksFunctions; import com.pulumi.databricks.inputs.GetNodeTypeArgs; import com.pulumi.databricks.inputs.GetSparkVersionArgs; import com.pulumi.databricks.Cluster; import com.pulumi.databricks.ClusterArgs; import com.pulumi.databricks.inputs.ClusterAutoscaleArgs; import java.util.List; import java.util.ArrayList; import java.util.Map; import java.io.File; import java.nio.file.Files; import java.nio.file.Paths; public class App { public static void main(String[] args) { Pulumi.run(App::stack); } public static void stack(Context ctx) { final var smallest = DatabricksFunctions.getNodeType(GetNodeTypeArgs.builder() .localDisk(true) .build()); final var latestLts = DatabricksFunctions.getSparkVersion(GetSparkVersionArgs.builder() .longTermSupport(true) .build()); var sharedAutoscaling = new Cluster("sharedAutoscaling", ClusterArgs.builder() .clusterName("Shared Autoscaling") .sparkVersion(latestLts.applyValue(getSparkVersionResult -> getSparkVersionResult.id())) .nodeTypeId(smallest.applyValue(getNodeTypeResult -> getNodeTypeResult.id())) .autoterminationMinutes(20) .autoscale(ClusterAutoscaleArgs.builder() .minWorkers(1) .maxWorkers(50) .build()) .sparkConf(Map.ofEntries( Map.entry("spark.databricks.io.cache.enabled", true), Map.entry("spark.databricks.io.cache.maxDiskUsage", "50g"), Map.entry("spark.databricks.io.cache.maxMetaDataCache", "1g") )) .build()); } }
resources: sharedAutoscaling: type: databricks:Cluster properties: clusterName: Shared Autoscaling sparkVersion: ${latestLts.id} nodeTypeId: ${smallest.id} autoterminationMinutes: 20 autoscale: minWorkers: 1 maxWorkers: 50 sparkConf: spark.databricks.io.cache.enabled: true spark.databricks.io.cache.maxDiskUsage: 50g spark.databricks.io.cache.maxMetaDataCache: 1g variables: smallest: fn::invoke: Function: databricks:getNodeType Arguments: localDisk: true latestLts: fn::invoke: Function: databricks:getSparkVersion Arguments: longTermSupport: true
- libraries
Sequence[Cluster
Library Args] - node_
type_ strid Any supported databricks.getNodeType id. If
instance_pool_id
is specified, this field is not needed.- num_
workers int Number of worker nodes that this cluster should have. A cluster has one Spark driver and
num_workers
executors for a total ofnum_workers
+ 1 Spark nodes.- policy_
id str - runtime_
engine str The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include:
PHOTON
,STANDARD
.- single_
user_ strname The optional user name of the user to assign to an interactive cluster. This field is required when using
data_security_mode
set toSINGLE_USER
or AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).- spark_
conf Mapping[str, Any] Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.
- spark_
env_ Mapping[str, Any]vars Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.
- ssh_
public_ Sequence[str]keys SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.
- workload_
type ClusterWorkload Type Args
- spark
Version String Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.
- apply
Policy BooleanDefault Values Whether to use policy default values for missing cluster attributes.
- autoscale Property Map
- autotermination
Minutes Number Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to
60
. We highly recommend having this setting present for Interactive/BI clusters.- aws
Attributes Property Map - azure
Attributes Property Map - cluster
Id String - cluster
Log Property MapConf - cluster
Mount List<Property Map>Infos - cluster
Name String Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.
- Map<Any>
Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to
default_tags
. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with anx_
when it is propagated.- data
Security StringMode Select the security features of the cluster. Unity Catalog requires
SINGLE_USER
orUSER_ISOLATION
mode.LEGACY_PASSTHROUGH
for passthrough cluster andLEGACY_TABLE_ACL
for Table ACL cluster. If omitted, no security features are enabled. In the Databricks UI, this has been recently been renamed Access Mode andUSER_ISOLATION
has been renamed Shared, but use these terms here.- docker
Image Property Map - driver
Instance StringPool Id similar to
instance_pool_id
, but for driver node. If omitted, andinstance_pool_id
is specified, then the driver will be allocated from that pool.- driver
Node StringType Id The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as
node_type_id
defined above.- enable
Elastic BooleanDisk If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have
autotermination_minutes
andautoscale
attributes set. More documentation available at cluster configuration page.- enable
Local BooleanDisk Encryption Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.
- gcp
Attributes Property Map - idempotency
Token String An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.
- init
Scripts List<Property Map> - instance
Pool StringId To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to
TERMINATED
, the instances it used are returned to the pool and reused by a different cluster.- is
Pinned Boolean boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 100, so
apply
may fail if you have more than that (this number may change over time, so check Databricks documentation for actual number).The following example demonstrates how to create an autoscaling cluster with Delta Cache enabled:
import * as pulumi from "@pulumi/pulumi"; import * as databricks from "@pulumi/databricks";
const smallest = databricks.getNodeType({ localDisk: true, }); const latestLts = databricks.getSparkVersion({ longTermSupport: true, }); const sharedAutoscaling = new databricks.Cluster("sharedAutoscaling", { clusterName: "Shared Autoscaling", sparkVersion: latestLts.then(latestLts => latestLts.id), nodeTypeId: smallest.then(smallest => smallest.id), autoterminationMinutes: 20, autoscale: { minWorkers: 1, maxWorkers: 50, }, sparkConf: { "spark.databricks.io.cache.enabled": true, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", }, });
import pulumi import pulumi_databricks as databricks smallest = databricks.get_node_type(local_disk=True) latest_lts = databricks.get_spark_version(long_term_support=True) shared_autoscaling = databricks.Cluster("sharedAutoscaling", cluster_name="Shared Autoscaling", spark_version=latest_lts.id, node_type_id=smallest.id, autotermination_minutes=20, autoscale=databricks.ClusterAutoscaleArgs( min_workers=1, max_workers=50, ), spark_conf={ "spark.databricks.io.cache.enabled": True, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", })
using System.Collections.Generic; using System.Linq; using Pulumi; using Databricks = Pulumi.Databricks; return await Deployment.RunAsync(() => { var smallest = Databricks.GetNodeType.Invoke(new() { LocalDisk = true, }); var latestLts = Databricks.GetSparkVersion.Invoke(new() { LongTermSupport = true, }); var sharedAutoscaling = new Databricks.Cluster("sharedAutoscaling", new() { ClusterName = "Shared Autoscaling", SparkVersion = latestLts.Apply(getSparkVersionResult => getSparkVersionResult.Id), NodeTypeId = smallest.Apply(getNodeTypeResult => getNodeTypeResult.Id), AutoterminationMinutes = 20, Autoscale = new Databricks.Inputs.ClusterAutoscaleArgs { MinWorkers = 1, MaxWorkers = 50, }, SparkConf = { { "spark.databricks.io.cache.enabled", true }, { "spark.databricks.io.cache.maxDiskUsage", "50g" }, { "spark.databricks.io.cache.maxMetaDataCache", "1g" }, }, }); });
package main import ( "github.com/pulumi/pulumi-databricks/sdk/go/databricks" "github.com/pulumi/pulumi/sdk/v3/go/pulumi" ) func main() { pulumi.Run(func(ctx *pulumi.Context) error { smallest, err := databricks.GetNodeType(ctx, &databricks.GetNodeTypeArgs{ LocalDisk: pulumi.BoolRef(true), }, nil) if err != nil { return err } latestLts, err := databricks.GetSparkVersion(ctx, &databricks.GetSparkVersionArgs{ LongTermSupport: pulumi.BoolRef(true), }, nil) if err != nil { return err } _, err = databricks.NewCluster(ctx, "sharedAutoscaling", &databricks.ClusterArgs{ ClusterName: pulumi.String("Shared Autoscaling"), SparkVersion: *pulumi.String(latestLts.Id), NodeTypeId: *pulumi.String(smallest.Id), AutoterminationMinutes: pulumi.Int(20), Autoscale: &databricks.ClusterAutoscaleArgs{ MinWorkers: pulumi.Int(1), MaxWorkers: pulumi.Int(50), }, SparkConf: pulumi.Map{ "spark.databricks.io.cache.enabled": pulumi.Any(true), "spark.databricks.io.cache.maxDiskUsage": pulumi.Any("50g"), "spark.databricks.io.cache.maxMetaDataCache": pulumi.Any("1g"), }, }) if err != nil { return err } return nil }) }
package generated_program; import com.pulumi.Context; import com.pulumi.Pulumi; import com.pulumi.core.Output; import com.pulumi.databricks.DatabricksFunctions; import com.pulumi.databricks.inputs.GetNodeTypeArgs; import com.pulumi.databricks.inputs.GetSparkVersionArgs; import com.pulumi.databricks.Cluster; import com.pulumi.databricks.ClusterArgs; import com.pulumi.databricks.inputs.ClusterAutoscaleArgs; import java.util.List; import java.util.ArrayList; import java.util.Map; import java.io.File; import java.nio.file.Files; import java.nio.file.Paths; public class App { public static void main(String[] args) { Pulumi.run(App::stack); } public static void stack(Context ctx) { final var smallest = DatabricksFunctions.getNodeType(GetNodeTypeArgs.builder() .localDisk(true) .build()); final var latestLts = DatabricksFunctions.getSparkVersion(GetSparkVersionArgs.builder() .longTermSupport(true) .build()); var sharedAutoscaling = new Cluster("sharedAutoscaling", ClusterArgs.builder() .clusterName("Shared Autoscaling") .sparkVersion(latestLts.applyValue(getSparkVersionResult -> getSparkVersionResult.id())) .nodeTypeId(smallest.applyValue(getNodeTypeResult -> getNodeTypeResult.id())) .autoterminationMinutes(20) .autoscale(ClusterAutoscaleArgs.builder() .minWorkers(1) .maxWorkers(50) .build()) .sparkConf(Map.ofEntries( Map.entry("spark.databricks.io.cache.enabled", true), Map.entry("spark.databricks.io.cache.maxDiskUsage", "50g"), Map.entry("spark.databricks.io.cache.maxMetaDataCache", "1g") )) .build()); } }
resources: sharedAutoscaling: type: databricks:Cluster properties: clusterName: Shared Autoscaling sparkVersion: ${latestLts.id} nodeTypeId: ${smallest.id} autoterminationMinutes: 20 autoscale: minWorkers: 1 maxWorkers: 50 sparkConf: spark.databricks.io.cache.enabled: true spark.databricks.io.cache.maxDiskUsage: 50g spark.databricks.io.cache.maxMetaDataCache: 1g variables: smallest: fn::invoke: Function: databricks:getNodeType Arguments: localDisk: true latestLts: fn::invoke: Function: databricks:getSparkVersion Arguments: longTermSupport: true
- libraries List<Property Map>
- node
Type StringId Any supported databricks.getNodeType id. If
instance_pool_id
is specified, this field is not needed.- num
Workers Number Number of worker nodes that this cluster should have. A cluster has one Spark driver and
num_workers
executors for a total ofnum_workers
+ 1 Spark nodes.- policy
Id String - runtime
Engine String The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include:
PHOTON
,STANDARD
.- single
User StringName The optional user name of the user to assign to an interactive cluster. This field is required when using
data_security_mode
set toSINGLE_USER
or AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).- spark
Conf Map<Any> Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.
- spark
Env Map<Any>Vars Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.
- ssh
Public List<String>Keys SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.
- workload
Type Property Map
Outputs
All input properties are implicitly available as output properties. Additionally, the Cluster resource produces the following output properties:
- Dictionary<string, object>
(map) Tags that are added by Databricks by default, regardless of any
custom_tags
that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.- Id string
The provider-assigned unique ID for this managed resource.
- State string
(string) State of the cluster.
- Url string
- map[string]interface{}
(map) Tags that are added by Databricks by default, regardless of any
custom_tags
that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.- Id string
The provider-assigned unique ID for this managed resource.
- State string
(string) State of the cluster.
- Url string
- Map<String,Object>
(map) Tags that are added by Databricks by default, regardless of any
custom_tags
that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.- id String
The provider-assigned unique ID for this managed resource.
- state String
(string) State of the cluster.
- url String
- {[key: string]: any}
(map) Tags that are added by Databricks by default, regardless of any
custom_tags
that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.- id string
The provider-assigned unique ID for this managed resource.
- state string
(string) State of the cluster.
- url string
- Mapping[str, Any]
(map) Tags that are added by Databricks by default, regardless of any
custom_tags
that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.- id str
The provider-assigned unique ID for this managed resource.
- state str
(string) State of the cluster.
- url str
- Map<Any>
(map) Tags that are added by Databricks by default, regardless of any
custom_tags
that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.- id String
The provider-assigned unique ID for this managed resource.
- state String
(string) State of the cluster.
- url String
Look up Existing Cluster Resource
Get an existing Cluster resource’s state with the given name, ID, and optional extra properties used to qualify the lookup.
public static get(name: string, id: Input<ID>, state?: ClusterState, opts?: CustomResourceOptions): Cluster
@staticmethod
def get(resource_name: str,
id: str,
opts: Optional[ResourceOptions] = None,
apply_policy_default_values: Optional[bool] = None,
autoscale: Optional[ClusterAutoscaleArgs] = None,
autotermination_minutes: Optional[int] = None,
aws_attributes: Optional[ClusterAwsAttributesArgs] = None,
azure_attributes: Optional[ClusterAzureAttributesArgs] = None,
cluster_id: Optional[str] = None,
cluster_log_conf: Optional[ClusterClusterLogConfArgs] = None,
cluster_mount_infos: Optional[Sequence[ClusterClusterMountInfoArgs]] = None,
cluster_name: Optional[str] = None,
custom_tags: Optional[Mapping[str, Any]] = None,
data_security_mode: Optional[str] = None,
default_tags: Optional[Mapping[str, Any]] = None,
docker_image: Optional[ClusterDockerImageArgs] = None,
driver_instance_pool_id: Optional[str] = None,
driver_node_type_id: Optional[str] = None,
enable_elastic_disk: Optional[bool] = None,
enable_local_disk_encryption: Optional[bool] = None,
gcp_attributes: Optional[ClusterGcpAttributesArgs] = None,
idempotency_token: Optional[str] = None,
init_scripts: Optional[Sequence[ClusterInitScriptArgs]] = None,
instance_pool_id: Optional[str] = None,
is_pinned: Optional[bool] = None,
libraries: Optional[Sequence[ClusterLibraryArgs]] = None,
node_type_id: Optional[str] = None,
num_workers: Optional[int] = None,
policy_id: Optional[str] = None,
runtime_engine: Optional[str] = None,
single_user_name: Optional[str] = None,
spark_conf: Optional[Mapping[str, Any]] = None,
spark_env_vars: Optional[Mapping[str, Any]] = None,
spark_version: Optional[str] = None,
ssh_public_keys: Optional[Sequence[str]] = None,
state: Optional[str] = None,
url: Optional[str] = None,
workload_type: Optional[ClusterWorkloadTypeArgs] = None) -> Cluster
func GetCluster(ctx *Context, name string, id IDInput, state *ClusterState, opts ...ResourceOption) (*Cluster, error)
public static Cluster Get(string name, Input<string> id, ClusterState? state, CustomResourceOptions? opts = null)
public static Cluster get(String name, Output<String> id, ClusterState state, CustomResourceOptions options)
Resource lookup is not supported in YAML
- name
- The unique name of the resulting resource.
- id
- The unique provider ID of the resource to lookup.
- state
- Any extra arguments used during the lookup.
- opts
- A bag of options that control this resource's behavior.
- resource_name
- The unique name of the resulting resource.
- id
- The unique provider ID of the resource to lookup.
- name
- The unique name of the resulting resource.
- id
- The unique provider ID of the resource to lookup.
- state
- Any extra arguments used during the lookup.
- opts
- A bag of options that control this resource's behavior.
- name
- The unique name of the resulting resource.
- id
- The unique provider ID of the resource to lookup.
- state
- Any extra arguments used during the lookup.
- opts
- A bag of options that control this resource's behavior.
- name
- The unique name of the resulting resource.
- id
- The unique provider ID of the resource to lookup.
- state
- Any extra arguments used during the lookup.
- opts
- A bag of options that control this resource's behavior.
- Apply
Policy boolDefault Values Whether to use policy default values for missing cluster attributes.
- Autoscale
Cluster
Autoscale - Autotermination
Minutes int Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to
60
. We highly recommend having this setting present for Interactive/BI clusters.- Aws
Attributes ClusterAws Attributes - Azure
Attributes ClusterAzure Attributes - Cluster
Id string - Cluster
Log ClusterConf Cluster Log Conf - Cluster
Mount List<ClusterInfos Cluster Mount Info> - Cluster
Name string Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.
- Dictionary<string, object>
Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to
default_tags
. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with anx_
when it is propagated.- Data
Security stringMode Select the security features of the cluster. Unity Catalog requires
SINGLE_USER
orUSER_ISOLATION
mode.LEGACY_PASSTHROUGH
for passthrough cluster andLEGACY_TABLE_ACL
for Table ACL cluster. If omitted, no security features are enabled. In the Databricks UI, this has been recently been renamed Access Mode andUSER_ISOLATION
has been renamed Shared, but use these terms here.- Dictionary<string, object>
(map) Tags that are added by Databricks by default, regardless of any
custom_tags
that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.- Docker
Image ClusterDocker Image - Driver
Instance stringPool Id similar to
instance_pool_id
, but for driver node. If omitted, andinstance_pool_id
is specified, then the driver will be allocated from that pool.- Driver
Node stringType Id The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as
node_type_id
defined above.- Enable
Elastic boolDisk If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have
autotermination_minutes
andautoscale
attributes set. More documentation available at cluster configuration page.- Enable
Local boolDisk Encryption Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.
- Gcp
Attributes ClusterGcp Attributes - Idempotency
Token string An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.
- Init
Scripts List<ClusterInit Script> - Instance
Pool stringId To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to
TERMINATED
, the instances it used are returned to the pool and reused by a different cluster.- Is
Pinned bool boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 100, so
apply
may fail if you have more than that (this number may change over time, so check Databricks documentation for actual number).The following example demonstrates how to create an autoscaling cluster with Delta Cache enabled:
import * as pulumi from "@pulumi/pulumi"; import * as databricks from "@pulumi/databricks";
const smallest = databricks.getNodeType({ localDisk: true, }); const latestLts = databricks.getSparkVersion({ longTermSupport: true, }); const sharedAutoscaling = new databricks.Cluster("sharedAutoscaling", { clusterName: "Shared Autoscaling", sparkVersion: latestLts.then(latestLts => latestLts.id), nodeTypeId: smallest.then(smallest => smallest.id), autoterminationMinutes: 20, autoscale: { minWorkers: 1, maxWorkers: 50, }, sparkConf: { "spark.databricks.io.cache.enabled": true, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", }, });
import pulumi import pulumi_databricks as databricks smallest = databricks.get_node_type(local_disk=True) latest_lts = databricks.get_spark_version(long_term_support=True) shared_autoscaling = databricks.Cluster("sharedAutoscaling", cluster_name="Shared Autoscaling", spark_version=latest_lts.id, node_type_id=smallest.id, autotermination_minutes=20, autoscale=databricks.ClusterAutoscaleArgs( min_workers=1, max_workers=50, ), spark_conf={ "spark.databricks.io.cache.enabled": True, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", })
using System.Collections.Generic; using System.Linq; using Pulumi; using Databricks = Pulumi.Databricks; return await Deployment.RunAsync(() => { var smallest = Databricks.GetNodeType.Invoke(new() { LocalDisk = true, }); var latestLts = Databricks.GetSparkVersion.Invoke(new() { LongTermSupport = true, }); var sharedAutoscaling = new Databricks.Cluster("sharedAutoscaling", new() { ClusterName = "Shared Autoscaling", SparkVersion = latestLts.Apply(getSparkVersionResult => getSparkVersionResult.Id), NodeTypeId = smallest.Apply(getNodeTypeResult => getNodeTypeResult.Id), AutoterminationMinutes = 20, Autoscale = new Databricks.Inputs.ClusterAutoscaleArgs { MinWorkers = 1, MaxWorkers = 50, }, SparkConf = { { "spark.databricks.io.cache.enabled", true }, { "spark.databricks.io.cache.maxDiskUsage", "50g" }, { "spark.databricks.io.cache.maxMetaDataCache", "1g" }, }, }); });
package main import ( "github.com/pulumi/pulumi-databricks/sdk/go/databricks" "github.com/pulumi/pulumi/sdk/v3/go/pulumi" ) func main() { pulumi.Run(func(ctx *pulumi.Context) error { smallest, err := databricks.GetNodeType(ctx, &databricks.GetNodeTypeArgs{ LocalDisk: pulumi.BoolRef(true), }, nil) if err != nil { return err } latestLts, err := databricks.GetSparkVersion(ctx, &databricks.GetSparkVersionArgs{ LongTermSupport: pulumi.BoolRef(true), }, nil) if err != nil { return err } _, err = databricks.NewCluster(ctx, "sharedAutoscaling", &databricks.ClusterArgs{ ClusterName: pulumi.String("Shared Autoscaling"), SparkVersion: *pulumi.String(latestLts.Id), NodeTypeId: *pulumi.String(smallest.Id), AutoterminationMinutes: pulumi.Int(20), Autoscale: &databricks.ClusterAutoscaleArgs{ MinWorkers: pulumi.Int(1), MaxWorkers: pulumi.Int(50), }, SparkConf: pulumi.Map{ "spark.databricks.io.cache.enabled": pulumi.Any(true), "spark.databricks.io.cache.maxDiskUsage": pulumi.Any("50g"), "spark.databricks.io.cache.maxMetaDataCache": pulumi.Any("1g"), }, }) if err != nil { return err } return nil }) }
package generated_program; import com.pulumi.Context; import com.pulumi.Pulumi; import com.pulumi.core.Output; import com.pulumi.databricks.DatabricksFunctions; import com.pulumi.databricks.inputs.GetNodeTypeArgs; import com.pulumi.databricks.inputs.GetSparkVersionArgs; import com.pulumi.databricks.Cluster; import com.pulumi.databricks.ClusterArgs; import com.pulumi.databricks.inputs.ClusterAutoscaleArgs; import java.util.List; import java.util.ArrayList; import java.util.Map; import java.io.File; import java.nio.file.Files; import java.nio.file.Paths; public class App { public static void main(String[] args) { Pulumi.run(App::stack); } public static void stack(Context ctx) { final var smallest = DatabricksFunctions.getNodeType(GetNodeTypeArgs.builder() .localDisk(true) .build()); final var latestLts = DatabricksFunctions.getSparkVersion(GetSparkVersionArgs.builder() .longTermSupport(true) .build()); var sharedAutoscaling = new Cluster("sharedAutoscaling", ClusterArgs.builder() .clusterName("Shared Autoscaling") .sparkVersion(latestLts.applyValue(getSparkVersionResult -> getSparkVersionResult.id())) .nodeTypeId(smallest.applyValue(getNodeTypeResult -> getNodeTypeResult.id())) .autoterminationMinutes(20) .autoscale(ClusterAutoscaleArgs.builder() .minWorkers(1) .maxWorkers(50) .build()) .sparkConf(Map.ofEntries( Map.entry("spark.databricks.io.cache.enabled", true), Map.entry("spark.databricks.io.cache.maxDiskUsage", "50g"), Map.entry("spark.databricks.io.cache.maxMetaDataCache", "1g") )) .build()); } }
resources: sharedAutoscaling: type: databricks:Cluster properties: clusterName: Shared Autoscaling sparkVersion: ${latestLts.id} nodeTypeId: ${smallest.id} autoterminationMinutes: 20 autoscale: minWorkers: 1 maxWorkers: 50 sparkConf: spark.databricks.io.cache.enabled: true spark.databricks.io.cache.maxDiskUsage: 50g spark.databricks.io.cache.maxMetaDataCache: 1g variables: smallest: fn::invoke: Function: databricks:getNodeType Arguments: localDisk: true latestLts: fn::invoke: Function: databricks:getSparkVersion Arguments: longTermSupport: true
- Libraries
List<Cluster
Library> - Node
Type stringId Any supported databricks.getNodeType id. If
instance_pool_id
is specified, this field is not needed.- Num
Workers int Number of worker nodes that this cluster should have. A cluster has one Spark driver and
num_workers
executors for a total ofnum_workers
+ 1 Spark nodes.- Policy
Id string - Runtime
Engine string The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include:
PHOTON
,STANDARD
.- Single
User stringName The optional user name of the user to assign to an interactive cluster. This field is required when using
data_security_mode
set toSINGLE_USER
or AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).- Spark
Conf Dictionary<string, object> Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.
- Spark
Env Dictionary<string, object>Vars Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.
- Spark
Version string Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.
- Ssh
Public List<string>Keys SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.
- State string
(string) State of the cluster.
- Url string
- Workload
Type ClusterWorkload Type
- Apply
Policy boolDefault Values Whether to use policy default values for missing cluster attributes.
- Autoscale
Cluster
Autoscale Args - Autotermination
Minutes int Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to
60
. We highly recommend having this setting present for Interactive/BI clusters.- Aws
Attributes ClusterAws Attributes Args - Azure
Attributes ClusterAzure Attributes Args - Cluster
Id string - Cluster
Log ClusterConf Cluster Log Conf Args - Cluster
Mount []ClusterInfos Cluster Mount Info Args - Cluster
Name string Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.
- map[string]interface{}
Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to
default_tags
. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with anx_
when it is propagated.- Data
Security stringMode Select the security features of the cluster. Unity Catalog requires
SINGLE_USER
orUSER_ISOLATION
mode.LEGACY_PASSTHROUGH
for passthrough cluster andLEGACY_TABLE_ACL
for Table ACL cluster. If omitted, no security features are enabled. In the Databricks UI, this has been recently been renamed Access Mode andUSER_ISOLATION
has been renamed Shared, but use these terms here.- map[string]interface{}
(map) Tags that are added by Databricks by default, regardless of any
custom_tags
that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.- Docker
Image ClusterDocker Image Args - Driver
Instance stringPool Id similar to
instance_pool_id
, but for driver node. If omitted, andinstance_pool_id
is specified, then the driver will be allocated from that pool.- Driver
Node stringType Id The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as
node_type_id
defined above.- Enable
Elastic boolDisk If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have
autotermination_minutes
andautoscale
attributes set. More documentation available at cluster configuration page.- Enable
Local boolDisk Encryption Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.
- Gcp
Attributes ClusterGcp Attributes Args - Idempotency
Token string An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.
- Init
Scripts []ClusterInit Script Args - Instance
Pool stringId To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to
TERMINATED
, the instances it used are returned to the pool and reused by a different cluster.- Is
Pinned bool boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 100, so
apply
may fail if you have more than that (this number may change over time, so check Databricks documentation for actual number).The following example demonstrates how to create an autoscaling cluster with Delta Cache enabled:
import * as pulumi from "@pulumi/pulumi"; import * as databricks from "@pulumi/databricks";
const smallest = databricks.getNodeType({ localDisk: true, }); const latestLts = databricks.getSparkVersion({ longTermSupport: true, }); const sharedAutoscaling = new databricks.Cluster("sharedAutoscaling", { clusterName: "Shared Autoscaling", sparkVersion: latestLts.then(latestLts => latestLts.id), nodeTypeId: smallest.then(smallest => smallest.id), autoterminationMinutes: 20, autoscale: { minWorkers: 1, maxWorkers: 50, }, sparkConf: { "spark.databricks.io.cache.enabled": true, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", }, });
import pulumi import pulumi_databricks as databricks smallest = databricks.get_node_type(local_disk=True) latest_lts = databricks.get_spark_version(long_term_support=True) shared_autoscaling = databricks.Cluster("sharedAutoscaling", cluster_name="Shared Autoscaling", spark_version=latest_lts.id, node_type_id=smallest.id, autotermination_minutes=20, autoscale=databricks.ClusterAutoscaleArgs( min_workers=1, max_workers=50, ), spark_conf={ "spark.databricks.io.cache.enabled": True, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", })
using System.Collections.Generic; using System.Linq; using Pulumi; using Databricks = Pulumi.Databricks; return await Deployment.RunAsync(() => { var smallest = Databricks.GetNodeType.Invoke(new() { LocalDisk = true, }); var latestLts = Databricks.GetSparkVersion.Invoke(new() { LongTermSupport = true, }); var sharedAutoscaling = new Databricks.Cluster("sharedAutoscaling", new() { ClusterName = "Shared Autoscaling", SparkVersion = latestLts.Apply(getSparkVersionResult => getSparkVersionResult.Id), NodeTypeId = smallest.Apply(getNodeTypeResult => getNodeTypeResult.Id), AutoterminationMinutes = 20, Autoscale = new Databricks.Inputs.ClusterAutoscaleArgs { MinWorkers = 1, MaxWorkers = 50, }, SparkConf = { { "spark.databricks.io.cache.enabled", true }, { "spark.databricks.io.cache.maxDiskUsage", "50g" }, { "spark.databricks.io.cache.maxMetaDataCache", "1g" }, }, }); });
package main import ( "github.com/pulumi/pulumi-databricks/sdk/go/databricks" "github.com/pulumi/pulumi/sdk/v3/go/pulumi" ) func main() { pulumi.Run(func(ctx *pulumi.Context) error { smallest, err := databricks.GetNodeType(ctx, &databricks.GetNodeTypeArgs{ LocalDisk: pulumi.BoolRef(true), }, nil) if err != nil { return err } latestLts, err := databricks.GetSparkVersion(ctx, &databricks.GetSparkVersionArgs{ LongTermSupport: pulumi.BoolRef(true), }, nil) if err != nil { return err } _, err = databricks.NewCluster(ctx, "sharedAutoscaling", &databricks.ClusterArgs{ ClusterName: pulumi.String("Shared Autoscaling"), SparkVersion: *pulumi.String(latestLts.Id), NodeTypeId: *pulumi.String(smallest.Id), AutoterminationMinutes: pulumi.Int(20), Autoscale: &databricks.ClusterAutoscaleArgs{ MinWorkers: pulumi.Int(1), MaxWorkers: pulumi.Int(50), }, SparkConf: pulumi.Map{ "spark.databricks.io.cache.enabled": pulumi.Any(true), "spark.databricks.io.cache.maxDiskUsage": pulumi.Any("50g"), "spark.databricks.io.cache.maxMetaDataCache": pulumi.Any("1g"), }, }) if err != nil { return err } return nil }) }
package generated_program; import com.pulumi.Context; import com.pulumi.Pulumi; import com.pulumi.core.Output; import com.pulumi.databricks.DatabricksFunctions; import com.pulumi.databricks.inputs.GetNodeTypeArgs; import com.pulumi.databricks.inputs.GetSparkVersionArgs; import com.pulumi.databricks.Cluster; import com.pulumi.databricks.ClusterArgs; import com.pulumi.databricks.inputs.ClusterAutoscaleArgs; import java.util.List; import java.util.ArrayList; import java.util.Map; import java.io.File; import java.nio.file.Files; import java.nio.file.Paths; public class App { public static void main(String[] args) { Pulumi.run(App::stack); } public static void stack(Context ctx) { final var smallest = DatabricksFunctions.getNodeType(GetNodeTypeArgs.builder() .localDisk(true) .build()); final var latestLts = DatabricksFunctions.getSparkVersion(GetSparkVersionArgs.builder() .longTermSupport(true) .build()); var sharedAutoscaling = new Cluster("sharedAutoscaling", ClusterArgs.builder() .clusterName("Shared Autoscaling") .sparkVersion(latestLts.applyValue(getSparkVersionResult -> getSparkVersionResult.id())) .nodeTypeId(smallest.applyValue(getNodeTypeResult -> getNodeTypeResult.id())) .autoterminationMinutes(20) .autoscale(ClusterAutoscaleArgs.builder() .minWorkers(1) .maxWorkers(50) .build()) .sparkConf(Map.ofEntries( Map.entry("spark.databricks.io.cache.enabled", true), Map.entry("spark.databricks.io.cache.maxDiskUsage", "50g"), Map.entry("spark.databricks.io.cache.maxMetaDataCache", "1g") )) .build()); } }
resources: sharedAutoscaling: type: databricks:Cluster properties: clusterName: Shared Autoscaling sparkVersion: ${latestLts.id} nodeTypeId: ${smallest.id} autoterminationMinutes: 20 autoscale: minWorkers: 1 maxWorkers: 50 sparkConf: spark.databricks.io.cache.enabled: true spark.databricks.io.cache.maxDiskUsage: 50g spark.databricks.io.cache.maxMetaDataCache: 1g variables: smallest: fn::invoke: Function: databricks:getNodeType Arguments: localDisk: true latestLts: fn::invoke: Function: databricks:getSparkVersion Arguments: longTermSupport: true
- Libraries
[]Cluster
Library Args - Node
Type stringId Any supported databricks.getNodeType id. If
instance_pool_id
is specified, this field is not needed.- Num
Workers int Number of worker nodes that this cluster should have. A cluster has one Spark driver and
num_workers
executors for a total ofnum_workers
+ 1 Spark nodes.- Policy
Id string - Runtime
Engine string The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include:
PHOTON
,STANDARD
.- Single
User stringName The optional user name of the user to assign to an interactive cluster. This field is required when using
data_security_mode
set toSINGLE_USER
or AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).- Spark
Conf map[string]interface{} Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.
- Spark
Env map[string]interface{}Vars Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.
- Spark
Version string Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.
- Ssh
Public []stringKeys SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.
- State string
(string) State of the cluster.
- Url string
- Workload
Type ClusterWorkload Type Args
- apply
Policy BooleanDefault Values Whether to use policy default values for missing cluster attributes.
- autoscale
Cluster
Autoscale - autotermination
Minutes Integer Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to
60
. We highly recommend having this setting present for Interactive/BI clusters.- aws
Attributes ClusterAws Attributes - azure
Attributes ClusterAzure Attributes - cluster
Id String - cluster
Log ClusterConf Cluster Log Conf - cluster
Mount List<ClusterInfos Cluster Mount Info> - cluster
Name String Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.
- Map<String,Object>
Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to
default_tags
. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with anx_
when it is propagated.- data
Security StringMode Select the security features of the cluster. Unity Catalog requires
SINGLE_USER
orUSER_ISOLATION
mode.LEGACY_PASSTHROUGH
for passthrough cluster andLEGACY_TABLE_ACL
for Table ACL cluster. If omitted, no security features are enabled. In the Databricks UI, this has been recently been renamed Access Mode andUSER_ISOLATION
has been renamed Shared, but use these terms here.- Map<String,Object>
(map) Tags that are added by Databricks by default, regardless of any
custom_tags
that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.- docker
Image ClusterDocker Image - driver
Instance StringPool Id similar to
instance_pool_id
, but for driver node. If omitted, andinstance_pool_id
is specified, then the driver will be allocated from that pool.- driver
Node StringType Id The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as
node_type_id
defined above.- enable
Elastic BooleanDisk If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have
autotermination_minutes
andautoscale
attributes set. More documentation available at cluster configuration page.- enable
Local BooleanDisk Encryption Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.
- gcp
Attributes ClusterGcp Attributes - idempotency
Token String An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.
- init
Scripts List<ClusterInit Script> - instance
Pool StringId To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to
TERMINATED
, the instances it used are returned to the pool and reused by a different cluster.- is
Pinned Boolean boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 100, so
apply
may fail if you have more than that (this number may change over time, so check Databricks documentation for actual number).The following example demonstrates how to create an autoscaling cluster with Delta Cache enabled:
import * as pulumi from "@pulumi/pulumi"; import * as databricks from "@pulumi/databricks";
const smallest = databricks.getNodeType({ localDisk: true, }); const latestLts = databricks.getSparkVersion({ longTermSupport: true, }); const sharedAutoscaling = new databricks.Cluster("sharedAutoscaling", { clusterName: "Shared Autoscaling", sparkVersion: latestLts.then(latestLts => latestLts.id), nodeTypeId: smallest.then(smallest => smallest.id), autoterminationMinutes: 20, autoscale: { minWorkers: 1, maxWorkers: 50, }, sparkConf: { "spark.databricks.io.cache.enabled": true, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", }, });
import pulumi import pulumi_databricks as databricks smallest = databricks.get_node_type(local_disk=True) latest_lts = databricks.get_spark_version(long_term_support=True) shared_autoscaling = databricks.Cluster("sharedAutoscaling", cluster_name="Shared Autoscaling", spark_version=latest_lts.id, node_type_id=smallest.id, autotermination_minutes=20, autoscale=databricks.ClusterAutoscaleArgs( min_workers=1, max_workers=50, ), spark_conf={ "spark.databricks.io.cache.enabled": True, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", })
using System.Collections.Generic; using System.Linq; using Pulumi; using Databricks = Pulumi.Databricks; return await Deployment.RunAsync(() => { var smallest = Databricks.GetNodeType.Invoke(new() { LocalDisk = true, }); var latestLts = Databricks.GetSparkVersion.Invoke(new() { LongTermSupport = true, }); var sharedAutoscaling = new Databricks.Cluster("sharedAutoscaling", new() { ClusterName = "Shared Autoscaling", SparkVersion = latestLts.Apply(getSparkVersionResult => getSparkVersionResult.Id), NodeTypeId = smallest.Apply(getNodeTypeResult => getNodeTypeResult.Id), AutoterminationMinutes = 20, Autoscale = new Databricks.Inputs.ClusterAutoscaleArgs { MinWorkers = 1, MaxWorkers = 50, }, SparkConf = { { "spark.databricks.io.cache.enabled", true }, { "spark.databricks.io.cache.maxDiskUsage", "50g" }, { "spark.databricks.io.cache.maxMetaDataCache", "1g" }, }, }); });
package main import ( "github.com/pulumi/pulumi-databricks/sdk/go/databricks" "github.com/pulumi/pulumi/sdk/v3/go/pulumi" ) func main() { pulumi.Run(func(ctx *pulumi.Context) error { smallest, err := databricks.GetNodeType(ctx, &databricks.GetNodeTypeArgs{ LocalDisk: pulumi.BoolRef(true), }, nil) if err != nil { return err } latestLts, err := databricks.GetSparkVersion(ctx, &databricks.GetSparkVersionArgs{ LongTermSupport: pulumi.BoolRef(true), }, nil) if err != nil { return err } _, err = databricks.NewCluster(ctx, "sharedAutoscaling", &databricks.ClusterArgs{ ClusterName: pulumi.String("Shared Autoscaling"), SparkVersion: *pulumi.String(latestLts.Id), NodeTypeId: *pulumi.String(smallest.Id), AutoterminationMinutes: pulumi.Int(20), Autoscale: &databricks.ClusterAutoscaleArgs{ MinWorkers: pulumi.Int(1), MaxWorkers: pulumi.Int(50), }, SparkConf: pulumi.Map{ "spark.databricks.io.cache.enabled": pulumi.Any(true), "spark.databricks.io.cache.maxDiskUsage": pulumi.Any("50g"), "spark.databricks.io.cache.maxMetaDataCache": pulumi.Any("1g"), }, }) if err != nil { return err } return nil }) }
package generated_program; import com.pulumi.Context; import com.pulumi.Pulumi; import com.pulumi.core.Output; import com.pulumi.databricks.DatabricksFunctions; import com.pulumi.databricks.inputs.GetNodeTypeArgs; import com.pulumi.databricks.inputs.GetSparkVersionArgs; import com.pulumi.databricks.Cluster; import com.pulumi.databricks.ClusterArgs; import com.pulumi.databricks.inputs.ClusterAutoscaleArgs; import java.util.List; import java.util.ArrayList; import java.util.Map; import java.io.File; import java.nio.file.Files; import java.nio.file.Paths; public class App { public static void main(String[] args) { Pulumi.run(App::stack); } public static void stack(Context ctx) { final var smallest = DatabricksFunctions.getNodeType(GetNodeTypeArgs.builder() .localDisk(true) .build()); final var latestLts = DatabricksFunctions.getSparkVersion(GetSparkVersionArgs.builder() .longTermSupport(true) .build()); var sharedAutoscaling = new Cluster("sharedAutoscaling", ClusterArgs.builder() .clusterName("Shared Autoscaling") .sparkVersion(latestLts.applyValue(getSparkVersionResult -> getSparkVersionResult.id())) .nodeTypeId(smallest.applyValue(getNodeTypeResult -> getNodeTypeResult.id())) .autoterminationMinutes(20) .autoscale(ClusterAutoscaleArgs.builder() .minWorkers(1) .maxWorkers(50) .build()) .sparkConf(Map.ofEntries( Map.entry("spark.databricks.io.cache.enabled", true), Map.entry("spark.databricks.io.cache.maxDiskUsage", "50g"), Map.entry("spark.databricks.io.cache.maxMetaDataCache", "1g") )) .build()); } }
resources: sharedAutoscaling: type: databricks:Cluster properties: clusterName: Shared Autoscaling sparkVersion: ${latestLts.id} nodeTypeId: ${smallest.id} autoterminationMinutes: 20 autoscale: minWorkers: 1 maxWorkers: 50 sparkConf: spark.databricks.io.cache.enabled: true spark.databricks.io.cache.maxDiskUsage: 50g spark.databricks.io.cache.maxMetaDataCache: 1g variables: smallest: fn::invoke: Function: databricks:getNodeType Arguments: localDisk: true latestLts: fn::invoke: Function: databricks:getSparkVersion Arguments: longTermSupport: true
- libraries
List<Cluster
Library> - node
Type StringId Any supported databricks.getNodeType id. If
instance_pool_id
is specified, this field is not needed.- num
Workers Integer Number of worker nodes that this cluster should have. A cluster has one Spark driver and
num_workers
executors for a total ofnum_workers
+ 1 Spark nodes.- policy
Id String - runtime
Engine String The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include:
PHOTON
,STANDARD
.- single
User StringName The optional user name of the user to assign to an interactive cluster. This field is required when using
data_security_mode
set toSINGLE_USER
or AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).- spark
Conf Map<String,Object> Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.
- spark
Env Map<String,Object>Vars Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.
- spark
Version String Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.
- ssh
Public List<String>Keys SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.
- state String
(string) State of the cluster.
- url String
- workload
Type ClusterWorkload Type
- apply
Policy booleanDefault Values Whether to use policy default values for missing cluster attributes.
- autoscale
Cluster
Autoscale - autotermination
Minutes number Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to
60
. We highly recommend having this setting present for Interactive/BI clusters.- aws
Attributes ClusterAws Attributes - azure
Attributes ClusterAzure Attributes - cluster
Id string - cluster
Log ClusterConf Cluster Log Conf - cluster
Mount ClusterInfos Cluster Mount Info[] - cluster
Name string Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.
- {[key: string]: any}
Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to
default_tags
. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with anx_
when it is propagated.- data
Security stringMode Select the security features of the cluster. Unity Catalog requires
SINGLE_USER
orUSER_ISOLATION
mode.LEGACY_PASSTHROUGH
for passthrough cluster andLEGACY_TABLE_ACL
for Table ACL cluster. If omitted, no security features are enabled. In the Databricks UI, this has been recently been renamed Access Mode andUSER_ISOLATION
has been renamed Shared, but use these terms here.- {[key: string]: any}
(map) Tags that are added by Databricks by default, regardless of any
custom_tags
that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.- docker
Image ClusterDocker Image - driver
Instance stringPool Id similar to
instance_pool_id
, but for driver node. If omitted, andinstance_pool_id
is specified, then the driver will be allocated from that pool.- driver
Node stringType Id The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as
node_type_id
defined above.- enable
Elastic booleanDisk If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have
autotermination_minutes
andautoscale
attributes set. More documentation available at cluster configuration page.- enable
Local booleanDisk Encryption Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.
- gcp
Attributes ClusterGcp Attributes - idempotency
Token string An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.
- init
Scripts ClusterInit Script[] - instance
Pool stringId To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to
TERMINATED
, the instances it used are returned to the pool and reused by a different cluster.- is
Pinned boolean boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 100, so
apply
may fail if you have more than that (this number may change over time, so check Databricks documentation for actual number).The following example demonstrates how to create an autoscaling cluster with Delta Cache enabled:
import * as pulumi from "@pulumi/pulumi"; import * as databricks from "@pulumi/databricks";
const smallest = databricks.getNodeType({ localDisk: true, }); const latestLts = databricks.getSparkVersion({ longTermSupport: true, }); const sharedAutoscaling = new databricks.Cluster("sharedAutoscaling", { clusterName: "Shared Autoscaling", sparkVersion: latestLts.then(latestLts => latestLts.id), nodeTypeId: smallest.then(smallest => smallest.id), autoterminationMinutes: 20, autoscale: { minWorkers: 1, maxWorkers: 50, }, sparkConf: { "spark.databricks.io.cache.enabled": true, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", }, });
import pulumi import pulumi_databricks as databricks smallest = databricks.get_node_type(local_disk=True) latest_lts = databricks.get_spark_version(long_term_support=True) shared_autoscaling = databricks.Cluster("sharedAutoscaling", cluster_name="Shared Autoscaling", spark_version=latest_lts.id, node_type_id=smallest.id, autotermination_minutes=20, autoscale=databricks.ClusterAutoscaleArgs( min_workers=1, max_workers=50, ), spark_conf={ "spark.databricks.io.cache.enabled": True, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", })
using System.Collections.Generic; using System.Linq; using Pulumi; using Databricks = Pulumi.Databricks; return await Deployment.RunAsync(() => { var smallest = Databricks.GetNodeType.Invoke(new() { LocalDisk = true, }); var latestLts = Databricks.GetSparkVersion.Invoke(new() { LongTermSupport = true, }); var sharedAutoscaling = new Databricks.Cluster("sharedAutoscaling", new() { ClusterName = "Shared Autoscaling", SparkVersion = latestLts.Apply(getSparkVersionResult => getSparkVersionResult.Id), NodeTypeId = smallest.Apply(getNodeTypeResult => getNodeTypeResult.Id), AutoterminationMinutes = 20, Autoscale = new Databricks.Inputs.ClusterAutoscaleArgs { MinWorkers = 1, MaxWorkers = 50, }, SparkConf = { { "spark.databricks.io.cache.enabled", true }, { "spark.databricks.io.cache.maxDiskUsage", "50g" }, { "spark.databricks.io.cache.maxMetaDataCache", "1g" }, }, }); });
package main import ( "github.com/pulumi/pulumi-databricks/sdk/go/databricks" "github.com/pulumi/pulumi/sdk/v3/go/pulumi" ) func main() { pulumi.Run(func(ctx *pulumi.Context) error { smallest, err := databricks.GetNodeType(ctx, &databricks.GetNodeTypeArgs{ LocalDisk: pulumi.BoolRef(true), }, nil) if err != nil { return err } latestLts, err := databricks.GetSparkVersion(ctx, &databricks.GetSparkVersionArgs{ LongTermSupport: pulumi.BoolRef(true), }, nil) if err != nil { return err } _, err = databricks.NewCluster(ctx, "sharedAutoscaling", &databricks.ClusterArgs{ ClusterName: pulumi.String("Shared Autoscaling"), SparkVersion: *pulumi.String(latestLts.Id), NodeTypeId: *pulumi.String(smallest.Id), AutoterminationMinutes: pulumi.Int(20), Autoscale: &databricks.ClusterAutoscaleArgs{ MinWorkers: pulumi.Int(1), MaxWorkers: pulumi.Int(50), }, SparkConf: pulumi.Map{ "spark.databricks.io.cache.enabled": pulumi.Any(true), "spark.databricks.io.cache.maxDiskUsage": pulumi.Any("50g"), "spark.databricks.io.cache.maxMetaDataCache": pulumi.Any("1g"), }, }) if err != nil { return err } return nil }) }
package generated_program; import com.pulumi.Context; import com.pulumi.Pulumi; import com.pulumi.core.Output; import com.pulumi.databricks.DatabricksFunctions; import com.pulumi.databricks.inputs.GetNodeTypeArgs; import com.pulumi.databricks.inputs.GetSparkVersionArgs; import com.pulumi.databricks.Cluster; import com.pulumi.databricks.ClusterArgs; import com.pulumi.databricks.inputs.ClusterAutoscaleArgs; import java.util.List; import java.util.ArrayList; import java.util.Map; import java.io.File; import java.nio.file.Files; import java.nio.file.Paths; public class App { public static void main(String[] args) { Pulumi.run(App::stack); } public static void stack(Context ctx) { final var smallest = DatabricksFunctions.getNodeType(GetNodeTypeArgs.builder() .localDisk(true) .build()); final var latestLts = DatabricksFunctions.getSparkVersion(GetSparkVersionArgs.builder() .longTermSupport(true) .build()); var sharedAutoscaling = new Cluster("sharedAutoscaling", ClusterArgs.builder() .clusterName("Shared Autoscaling") .sparkVersion(latestLts.applyValue(getSparkVersionResult -> getSparkVersionResult.id())) .nodeTypeId(smallest.applyValue(getNodeTypeResult -> getNodeTypeResult.id())) .autoterminationMinutes(20) .autoscale(ClusterAutoscaleArgs.builder() .minWorkers(1) .maxWorkers(50) .build()) .sparkConf(Map.ofEntries( Map.entry("spark.databricks.io.cache.enabled", true), Map.entry("spark.databricks.io.cache.maxDiskUsage", "50g"), Map.entry("spark.databricks.io.cache.maxMetaDataCache", "1g") )) .build()); } }
resources: sharedAutoscaling: type: databricks:Cluster properties: clusterName: Shared Autoscaling sparkVersion: ${latestLts.id} nodeTypeId: ${smallest.id} autoterminationMinutes: 20 autoscale: minWorkers: 1 maxWorkers: 50 sparkConf: spark.databricks.io.cache.enabled: true spark.databricks.io.cache.maxDiskUsage: 50g spark.databricks.io.cache.maxMetaDataCache: 1g variables: smallest: fn::invoke: Function: databricks:getNodeType Arguments: localDisk: true latestLts: fn::invoke: Function: databricks:getSparkVersion Arguments: longTermSupport: true
- libraries
Cluster
Library[] - node
Type stringId Any supported databricks.getNodeType id. If
instance_pool_id
is specified, this field is not needed.- num
Workers number Number of worker nodes that this cluster should have. A cluster has one Spark driver and
num_workers
executors for a total ofnum_workers
+ 1 Spark nodes.- policy
Id string - runtime
Engine string The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include:
PHOTON
,STANDARD
.- single
User stringName The optional user name of the user to assign to an interactive cluster. This field is required when using
data_security_mode
set toSINGLE_USER
or AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).- spark
Conf {[key: string]: any} Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.
- spark
Env {[key: string]: any}Vars Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.
- spark
Version string Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.
- ssh
Public string[]Keys SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.
- state string
(string) State of the cluster.
- url string
- workload
Type ClusterWorkload Type
- apply_
policy_ booldefault_ values Whether to use policy default values for missing cluster attributes.
- autoscale
Cluster
Autoscale Args - autotermination_
minutes int Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to
60
. We highly recommend having this setting present for Interactive/BI clusters.- aws_
attributes ClusterAws Attributes Args - azure_
attributes ClusterAzure Attributes Args - cluster_
id str - cluster_
log_ Clusterconf Cluster Log Conf Args - cluster_
mount_ Sequence[Clusterinfos Cluster Mount Info Args] - cluster_
name str Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.
- Mapping[str, Any]
Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to
default_tags
. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with anx_
when it is propagated.- data_
security_ strmode Select the security features of the cluster. Unity Catalog requires
SINGLE_USER
orUSER_ISOLATION
mode.LEGACY_PASSTHROUGH
for passthrough cluster andLEGACY_TABLE_ACL
for Table ACL cluster. If omitted, no security features are enabled. In the Databricks UI, this has been recently been renamed Access Mode andUSER_ISOLATION
has been renamed Shared, but use these terms here.- Mapping[str, Any]
(map) Tags that are added by Databricks by default, regardless of any
custom_tags
that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.- docker_
image ClusterDocker Image Args - driver_
instance_ strpool_ id similar to
instance_pool_id
, but for driver node. If omitted, andinstance_pool_id
is specified, then the driver will be allocated from that pool.- driver_
node_ strtype_ id The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as
node_type_id
defined above.- enable_
elastic_ booldisk If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have
autotermination_minutes
andautoscale
attributes set. More documentation available at cluster configuration page.- enable_
local_ booldisk_ encryption Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.
- gcp_
attributes ClusterGcp Attributes Args - idempotency_
token str An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.
- init_
scripts Sequence[ClusterInit Script Args] - instance_
pool_ strid To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to
TERMINATED
, the instances it used are returned to the pool and reused by a different cluster.- is_
pinned bool boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 100, so
apply
may fail if you have more than that (this number may change over time, so check Databricks documentation for actual number).The following example demonstrates how to create an autoscaling cluster with Delta Cache enabled:
import * as pulumi from "@pulumi/pulumi"; import * as databricks from "@pulumi/databricks";
const smallest = databricks.getNodeType({ localDisk: true, }); const latestLts = databricks.getSparkVersion({ longTermSupport: true, }); const sharedAutoscaling = new databricks.Cluster("sharedAutoscaling", { clusterName: "Shared Autoscaling", sparkVersion: latestLts.then(latestLts => latestLts.id), nodeTypeId: smallest.then(smallest => smallest.id), autoterminationMinutes: 20, autoscale: { minWorkers: 1, maxWorkers: 50, }, sparkConf: { "spark.databricks.io.cache.enabled": true, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", }, });
import pulumi import pulumi_databricks as databricks smallest = databricks.get_node_type(local_disk=True) latest_lts = databricks.get_spark_version(long_term_support=True) shared_autoscaling = databricks.Cluster("sharedAutoscaling", cluster_name="Shared Autoscaling", spark_version=latest_lts.id, node_type_id=smallest.id, autotermination_minutes=20, autoscale=databricks.ClusterAutoscaleArgs( min_workers=1, max_workers=50, ), spark_conf={ "spark.databricks.io.cache.enabled": True, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", })
using System.Collections.Generic; using System.Linq; using Pulumi; using Databricks = Pulumi.Databricks; return await Deployment.RunAsync(() => { var smallest = Databricks.GetNodeType.Invoke(new() { LocalDisk = true, }); var latestLts = Databricks.GetSparkVersion.Invoke(new() { LongTermSupport = true, }); var sharedAutoscaling = new Databricks.Cluster("sharedAutoscaling", new() { ClusterName = "Shared Autoscaling", SparkVersion = latestLts.Apply(getSparkVersionResult => getSparkVersionResult.Id), NodeTypeId = smallest.Apply(getNodeTypeResult => getNodeTypeResult.Id), AutoterminationMinutes = 20, Autoscale = new Databricks.Inputs.ClusterAutoscaleArgs { MinWorkers = 1, MaxWorkers = 50, }, SparkConf = { { "spark.databricks.io.cache.enabled", true }, { "spark.databricks.io.cache.maxDiskUsage", "50g" }, { "spark.databricks.io.cache.maxMetaDataCache", "1g" }, }, }); });
package main import ( "github.com/pulumi/pulumi-databricks/sdk/go/databricks" "github.com/pulumi/pulumi/sdk/v3/go/pulumi" ) func main() { pulumi.Run(func(ctx *pulumi.Context) error { smallest, err := databricks.GetNodeType(ctx, &databricks.GetNodeTypeArgs{ LocalDisk: pulumi.BoolRef(true), }, nil) if err != nil { return err } latestLts, err := databricks.GetSparkVersion(ctx, &databricks.GetSparkVersionArgs{ LongTermSupport: pulumi.BoolRef(true), }, nil) if err != nil { return err } _, err = databricks.NewCluster(ctx, "sharedAutoscaling", &databricks.ClusterArgs{ ClusterName: pulumi.String("Shared Autoscaling"), SparkVersion: *pulumi.String(latestLts.Id), NodeTypeId: *pulumi.String(smallest.Id), AutoterminationMinutes: pulumi.Int(20), Autoscale: &databricks.ClusterAutoscaleArgs{ MinWorkers: pulumi.Int(1), MaxWorkers: pulumi.Int(50), }, SparkConf: pulumi.Map{ "spark.databricks.io.cache.enabled": pulumi.Any(true), "spark.databricks.io.cache.maxDiskUsage": pulumi.Any("50g"), "spark.databricks.io.cache.maxMetaDataCache": pulumi.Any("1g"), }, }) if err != nil { return err } return nil }) }
package generated_program; import com.pulumi.Context; import com.pulumi.Pulumi; import com.pulumi.core.Output; import com.pulumi.databricks.DatabricksFunctions; import com.pulumi.databricks.inputs.GetNodeTypeArgs; import com.pulumi.databricks.inputs.GetSparkVersionArgs; import com.pulumi.databricks.Cluster; import com.pulumi.databricks.ClusterArgs; import com.pulumi.databricks.inputs.ClusterAutoscaleArgs; import java.util.List; import java.util.ArrayList; import java.util.Map; import java.io.File; import java.nio.file.Files; import java.nio.file.Paths; public class App { public static void main(String[] args) { Pulumi.run(App::stack); } public static void stack(Context ctx) { final var smallest = DatabricksFunctions.getNodeType(GetNodeTypeArgs.builder() .localDisk(true) .build()); final var latestLts = DatabricksFunctions.getSparkVersion(GetSparkVersionArgs.builder() .longTermSupport(true) .build()); var sharedAutoscaling = new Cluster("sharedAutoscaling", ClusterArgs.builder() .clusterName("Shared Autoscaling") .sparkVersion(latestLts.applyValue(getSparkVersionResult -> getSparkVersionResult.id())) .nodeTypeId(smallest.applyValue(getNodeTypeResult -> getNodeTypeResult.id())) .autoterminationMinutes(20) .autoscale(ClusterAutoscaleArgs.builder() .minWorkers(1) .maxWorkers(50) .build()) .sparkConf(Map.ofEntries( Map.entry("spark.databricks.io.cache.enabled", true), Map.entry("spark.databricks.io.cache.maxDiskUsage", "50g"), Map.entry("spark.databricks.io.cache.maxMetaDataCache", "1g") )) .build()); } }
resources: sharedAutoscaling: type: databricks:Cluster properties: clusterName: Shared Autoscaling sparkVersion: ${latestLts.id} nodeTypeId: ${smallest.id} autoterminationMinutes: 20 autoscale: minWorkers: 1 maxWorkers: 50 sparkConf: spark.databricks.io.cache.enabled: true spark.databricks.io.cache.maxDiskUsage: 50g spark.databricks.io.cache.maxMetaDataCache: 1g variables: smallest: fn::invoke: Function: databricks:getNodeType Arguments: localDisk: true latestLts: fn::invoke: Function: databricks:getSparkVersion Arguments: longTermSupport: true
- libraries
Sequence[Cluster
Library Args] - node_
type_ strid Any supported databricks.getNodeType id. If
instance_pool_id
is specified, this field is not needed.- num_
workers int Number of worker nodes that this cluster should have. A cluster has one Spark driver and
num_workers
executors for a total ofnum_workers
+ 1 Spark nodes.- policy_
id str - runtime_
engine str The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include:
PHOTON
,STANDARD
.- single_
user_ strname The optional user name of the user to assign to an interactive cluster. This field is required when using
data_security_mode
set toSINGLE_USER
or AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).- spark_
conf Mapping[str, Any] Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.
- spark_
env_ Mapping[str, Any]vars Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.
- spark_
version str Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.
- ssh_
public_ Sequence[str]keys SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.
- state str
(string) State of the cluster.
- url str
- workload_
type ClusterWorkload Type Args
- apply
Policy BooleanDefault Values Whether to use policy default values for missing cluster attributes.
- autoscale Property Map
- autotermination
Minutes Number Automatically terminate the cluster after being inactive for this time in minutes. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to
60
. We highly recommend having this setting present for Interactive/BI clusters.- aws
Attributes Property Map - azure
Attributes Property Map - cluster
Id String - cluster
Log Property MapConf - cluster
Mount List<Property Map>Infos - cluster
Name String Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.
- Map<Any>
Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to
default_tags
. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with anx_
when it is propagated.- data
Security StringMode Select the security features of the cluster. Unity Catalog requires
SINGLE_USER
orUSER_ISOLATION
mode.LEGACY_PASSTHROUGH
for passthrough cluster andLEGACY_TABLE_ACL
for Table ACL cluster. If omitted, no security features are enabled. In the Databricks UI, this has been recently been renamed Access Mode andUSER_ISOLATION
has been renamed Shared, but use these terms here.- Map<Any>
(map) Tags that are added by Databricks by default, regardless of any
custom_tags
that may have been added. These include: Vendor: Databricks, Creator: <username_of_creator>, ClusterName: <name_of_cluster>, ClusterId: <id_of_cluster>, Name: , and any workspace and pool tags.- docker
Image Property Map - driver
Instance StringPool Id similar to
instance_pool_id
, but for driver node. If omitted, andinstance_pool_id
is specified, then the driver will be allocated from that pool.- driver
Node StringType Id The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as
node_type_id
defined above.- enable
Elastic BooleanDisk If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have
autotermination_minutes
andautoscale
attributes set. More documentation available at cluster configuration page.- enable
Local BooleanDisk Encryption Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.
- gcp
Attributes Property Map - idempotency
Token String An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.
- init
Scripts List<Property Map> - instance
Pool StringId To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to
TERMINATED
, the instances it used are returned to the pool and reused by a different cluster.- is
Pinned Boolean boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 100, so
apply
may fail if you have more than that (this number may change over time, so check Databricks documentation for actual number).The following example demonstrates how to create an autoscaling cluster with Delta Cache enabled:
import * as pulumi from "@pulumi/pulumi"; import * as databricks from "@pulumi/databricks";
const smallest = databricks.getNodeType({ localDisk: true, }); const latestLts = databricks.getSparkVersion({ longTermSupport: true, }); const sharedAutoscaling = new databricks.Cluster("sharedAutoscaling", { clusterName: "Shared Autoscaling", sparkVersion: latestLts.then(latestLts => latestLts.id), nodeTypeId: smallest.then(smallest => smallest.id), autoterminationMinutes: 20, autoscale: { minWorkers: 1, maxWorkers: 50, }, sparkConf: { "spark.databricks.io.cache.enabled": true, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", }, });
import pulumi import pulumi_databricks as databricks smallest = databricks.get_node_type(local_disk=True) latest_lts = databricks.get_spark_version(long_term_support=True) shared_autoscaling = databricks.Cluster("sharedAutoscaling", cluster_name="Shared Autoscaling", spark_version=latest_lts.id, node_type_id=smallest.id, autotermination_minutes=20, autoscale=databricks.ClusterAutoscaleArgs( min_workers=1, max_workers=50, ), spark_conf={ "spark.databricks.io.cache.enabled": True, "spark.databricks.io.cache.maxDiskUsage": "50g", "spark.databricks.io.cache.maxMetaDataCache": "1g", })
using System.Collections.Generic; using System.Linq; using Pulumi; using Databricks = Pulumi.Databricks; return await Deployment.RunAsync(() => { var smallest = Databricks.GetNodeType.Invoke(new() { LocalDisk = true, }); var latestLts = Databricks.GetSparkVersion.Invoke(new() { LongTermSupport = true, }); var sharedAutoscaling = new Databricks.Cluster("sharedAutoscaling", new() { ClusterName = "Shared Autoscaling", SparkVersion = latestLts.Apply(getSparkVersionResult => getSparkVersionResult.Id), NodeTypeId = smallest.Apply(getNodeTypeResult => getNodeTypeResult.Id), AutoterminationMinutes = 20, Autoscale = new Databricks.Inputs.ClusterAutoscaleArgs { MinWorkers = 1, MaxWorkers = 50, }, SparkConf = { { "spark.databricks.io.cache.enabled", true }, { "spark.databricks.io.cache.maxDiskUsage", "50g" }, { "spark.databricks.io.cache.maxMetaDataCache", "1g" }, }, }); });
package main import ( "github.com/pulumi/pulumi-databricks/sdk/go/databricks" "github.com/pulumi/pulumi/sdk/v3/go/pulumi" ) func main() { pulumi.Run(func(ctx *pulumi.Context) error { smallest, err := databricks.GetNodeType(ctx, &databricks.GetNodeTypeArgs{ LocalDisk: pulumi.BoolRef(true), }, nil) if err != nil { return err } latestLts, err := databricks.GetSparkVersion(ctx, &databricks.GetSparkVersionArgs{ LongTermSupport: pulumi.BoolRef(true), }, nil) if err != nil { return err } _, err = databricks.NewCluster(ctx, "sharedAutoscaling", &databricks.ClusterArgs{ ClusterName: pulumi.String("Shared Autoscaling"), SparkVersion: *pulumi.String(latestLts.Id), NodeTypeId: *pulumi.String(smallest.Id), AutoterminationMinutes: pulumi.Int(20), Autoscale: &databricks.ClusterAutoscaleArgs{ MinWorkers: pulumi.Int(1), MaxWorkers: pulumi.Int(50), }, SparkConf: pulumi.Map{ "spark.databricks.io.cache.enabled": pulumi.Any(true), "spark.databricks.io.cache.maxDiskUsage": pulumi.Any("50g"), "spark.databricks.io.cache.maxMetaDataCache": pulumi.Any("1g"), }, }) if err != nil { return err } return nil }) }
package generated_program; import com.pulumi.Context; import com.pulumi.Pulumi; import com.pulumi.core.Output; import com.pulumi.databricks.DatabricksFunctions; import com.pulumi.databricks.inputs.GetNodeTypeArgs; import com.pulumi.databricks.inputs.GetSparkVersionArgs; import com.pulumi.databricks.Cluster; import com.pulumi.databricks.ClusterArgs; import com.pulumi.databricks.inputs.ClusterAutoscaleArgs; import java.util.List; import java.util.ArrayList; import java.util.Map; import java.io.File; import java.nio.file.Files; import java.nio.file.Paths; public class App { public static void main(String[] args) { Pulumi.run(App::stack); } public static void stack(Context ctx) { final var smallest = DatabricksFunctions.getNodeType(GetNodeTypeArgs.builder() .localDisk(true) .build()); final var latestLts = DatabricksFunctions.getSparkVersion(GetSparkVersionArgs.builder() .longTermSupport(true) .build()); var sharedAutoscaling = new Cluster("sharedAutoscaling", ClusterArgs.builder() .clusterName("Shared Autoscaling") .sparkVersion(latestLts.applyValue(getSparkVersionResult -> getSparkVersionResult.id())) .nodeTypeId(smallest.applyValue(getNodeTypeResult -> getNodeTypeResult.id())) .autoterminationMinutes(20) .autoscale(ClusterAutoscaleArgs.builder() .minWorkers(1) .maxWorkers(50) .build()) .sparkConf(Map.ofEntries( Map.entry("spark.databricks.io.cache.enabled", true), Map.entry("spark.databricks.io.cache.maxDiskUsage", "50g"), Map.entry("spark.databricks.io.cache.maxMetaDataCache", "1g") )) .build()); } }
resources: sharedAutoscaling: type: databricks:Cluster properties: clusterName: Shared Autoscaling sparkVersion: ${latestLts.id} nodeTypeId: ${smallest.id} autoterminationMinutes: 20 autoscale: minWorkers: 1 maxWorkers: 50 sparkConf: spark.databricks.io.cache.enabled: true spark.databricks.io.cache.maxDiskUsage: 50g spark.databricks.io.cache.maxMetaDataCache: 1g variables: smallest: fn::invoke: Function: databricks:getNodeType Arguments: localDisk: true latestLts: fn::invoke: Function: databricks:getSparkVersion Arguments: longTermSupport: true
- libraries List<Property Map>
- node
Type StringId Any supported databricks.getNodeType id. If
instance_pool_id
is specified, this field is not needed.- num
Workers Number Number of worker nodes that this cluster should have. A cluster has one Spark driver and
num_workers
executors for a total ofnum_workers
+ 1 Spark nodes.- policy
Id String - runtime
Engine String The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include:
PHOTON
,STANDARD
.- single
User StringName The optional user name of the user to assign to an interactive cluster. This field is required when using
data_security_mode
set toSINGLE_USER
or AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).- spark
Conf Map<Any> Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.
- spark
Env Map<Any>Vars Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.
- spark
Version String Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.
- ssh
Public List<String>Keys SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.
- state String
(string) State of the cluster.
- url String
- workload
Type Property Map
Supporting Types
ClusterAutoscale, ClusterAutoscaleArgs
- Max
Workers int - Min
Workers int
- Max
Workers int - Min
Workers int
- max
Workers Integer - min
Workers Integer
- max
Workers number - min
Workers number
- max_
workers int - min_
workers int
- max
Workers Number - min
Workers Number
ClusterAwsAttributes, ClusterAwsAttributesArgs
- Availability string
- Ebs
Volume intCount - Ebs
Volume intSize - Ebs
Volume stringType - First
On intDemand - Instance
Profile stringArn - Spot
Bid intPrice Percent - Zone
Id string
- Availability string
- Ebs
Volume intCount - Ebs
Volume intSize - Ebs
Volume stringType - First
On intDemand - Instance
Profile stringArn - Spot
Bid intPrice Percent - Zone
Id string
- availability String
- ebs
Volume IntegerCount - ebs
Volume IntegerSize - ebs
Volume StringType - first
On IntegerDemand - instance
Profile StringArn - spot
Bid IntegerPrice Percent - zone
Id String
- availability string
- ebs
Volume numberCount - ebs
Volume numberSize - ebs
Volume stringType - first
On numberDemand - instance
Profile stringArn - spot
Bid numberPrice Percent - zone
Id string
- availability str
- ebs_
volume_ intcount - ebs_
volume_ intsize - ebs_
volume_ strtype - first_
on_ intdemand - instance_
profile_ strarn - spot_
bid_ intprice_ percent - zone_
id str
- availability String
- ebs
Volume NumberCount - ebs
Volume NumberSize - ebs
Volume StringType - first
On NumberDemand - instance
Profile StringArn - spot
Bid NumberPrice Percent - zone
Id String
ClusterAzureAttributes, ClusterAzureAttributesArgs
- Availability string
- First
On intDemand - Spot
Bid doubleMax Price
- Availability string
- First
On intDemand - Spot
Bid float64Max Price
- availability String
- first
On IntegerDemand - spot
Bid DoubleMax Price
- availability string
- first
On numberDemand - spot
Bid numberMax Price
- availability str
- first_
on_ intdemand - spot_
bid_ floatmax_ price
- availability String
- first
On NumberDemand - spot
Bid NumberMax Price
ClusterClusterLogConf, ClusterClusterLogConfArgs
ClusterClusterLogConfDbfs, ClusterClusterLogConfDbfsArgs
- Destination string
- Destination string
- destination String
- destination string
- destination str
- destination String
ClusterClusterLogConfS3, ClusterClusterLogConfS3Args
- Destination string
- Canned
Acl string - Enable
Encryption bool - Encryption
Type string - Endpoint string
- Kms
Key string - Region string
- Destination string
- Canned
Acl string - Enable
Encryption bool - Encryption
Type string - Endpoint string
- Kms
Key string - Region string
- destination String
- canned
Acl String - enable
Encryption Boolean - encryption
Type String - endpoint String
- kms
Key String - region String
- destination string
- canned
Acl string - enable
Encryption boolean - encryption
Type string - endpoint string
- kms
Key string - region string
- destination str
- canned_
acl str - enable_
encryption bool - encryption_
type str - endpoint str
- kms_
key str - region str
- destination String
- canned
Acl String - enable
Encryption Boolean - encryption
Type String - endpoint String
- kms
Key String - region String
ClusterClusterMountInfo, ClusterClusterMountInfoArgs
ClusterClusterMountInfoNetworkFilesystemInfo, ClusterClusterMountInfoNetworkFilesystemInfoArgs
- Server
Address string - Mount
Options string
- Server
Address string - Mount
Options string
- server
Address String - mount
Options String
- server
Address string - mount
Options string
- server_
address str - mount_
options str
- server
Address String - mount
Options String
ClusterDockerImage, ClusterDockerImageArgs
- url String
- basic
Auth Property Map
ClusterDockerImageBasicAuth, ClusterDockerImageBasicAuthArgs
ClusterGcpAttributes, ClusterGcpAttributesArgs
- Availability string
- Boot
Disk intSize - Google
Service stringAccount - Local
Ssd intCount - Use
Preemptible boolExecutors Please use 'availability' instead.
- Zone
Id string
- Availability string
- Boot
Disk intSize - Google
Service stringAccount - Local
Ssd intCount - Use
Preemptible boolExecutors Please use 'availability' instead.
- Zone
Id string
- availability String
- boot
Disk IntegerSize - google
Service StringAccount - local
Ssd IntegerCount - use
Preemptible BooleanExecutors Please use 'availability' instead.
- zone
Id String
- availability string
- boot
Disk numberSize - google
Service stringAccount - local
Ssd numberCount - use
Preemptible booleanExecutors Please use 'availability' instead.
- zone
Id string
- availability str
- boot_
disk_ intsize - google_
service_ straccount - local_
ssd_ intcount - use_
preemptible_ boolexecutors Please use 'availability' instead.
- zone_
id str
- availability String
- boot
Disk NumberSize - google
Service StringAccount - local
Ssd NumberCount - use
Preemptible BooleanExecutors Please use 'availability' instead.
- zone
Id String
ClusterInitScript, ClusterInitScriptArgs
- Abfss
Cluster
Init Script Abfss - Dbfs
Cluster
Init Script Dbfs For init scripts use 'volumes', 'workspace' or cloud storage location instead of 'dbfs'.
- File
Cluster
Init Script File - Gcs
Cluster
Init Script Gcs - S3
Cluster
Init Script S3 - Volumes
Cluster
Init Script Volumes - Workspace
Cluster
Init Script Workspace
- Abfss
Cluster
Init Script Abfss - Dbfs
Cluster
Init Script Dbfs For init scripts use 'volumes', 'workspace' or cloud storage location instead of 'dbfs'.
- File
Cluster
Init Script File - Gcs
Cluster
Init Script Gcs - S3
Cluster
Init Script S3 - Volumes
Cluster
Init Script Volumes - Workspace
Cluster
Init Script Workspace
- abfss
Cluster
Init Script Abfss - dbfs
Cluster
Init Script Dbfs For init scripts use 'volumes', 'workspace' or cloud storage location instead of 'dbfs'.
- file
Cluster
Init Script File - gcs
Cluster
Init Script Gcs - s3
Cluster
Init Script S3 - volumes
Cluster
Init Script Volumes - workspace
Cluster
Init Script Workspace
- abfss
Cluster
Init Script Abfss - dbfs
Cluster
Init Script Dbfs For init scripts use 'volumes', 'workspace' or cloud storage location instead of 'dbfs'.
- file
Cluster
Init Script File - gcs
Cluster
Init Script Gcs - s3
Cluster
Init Script S3 - volumes
Cluster
Init Script Volumes - workspace
Cluster
Init Script Workspace
- abfss
Cluster
Init Script Abfss - dbfs
Cluster
Init Script Dbfs For init scripts use 'volumes', 'workspace' or cloud storage location instead of 'dbfs'.
- file
Cluster
Init Script File - gcs
Cluster
Init Script Gcs - s3
Cluster
Init Script S3 - volumes
Cluster
Init Script Volumes - workspace
Cluster
Init Script Workspace
- abfss Property Map
- dbfs Property Map
For init scripts use 'volumes', 'workspace' or cloud storage location instead of 'dbfs'.
- file Property Map
- gcs Property Map
- s3 Property Map
- volumes Property Map
- workspace Property Map
ClusterInitScriptAbfss, ClusterInitScriptAbfssArgs
- Destination string
- Destination string
- destination String
- destination string
- destination str
- destination String
ClusterInitScriptDbfs, ClusterInitScriptDbfsArgs
- Destination string
- Destination string
- destination String
- destination string
- destination str
- destination String
ClusterInitScriptFile, ClusterInitScriptFileArgs
- Destination string
- Destination string
- destination String
- destination string
- destination str
- destination String
ClusterInitScriptGcs, ClusterInitScriptGcsArgs
- Destination string
- Destination string
- destination String
- destination string
- destination str
- destination String
ClusterInitScriptS3, ClusterInitScriptS3Args
- Destination string
- Canned
Acl string - Enable
Encryption bool - Encryption
Type string - Endpoint string
- Kms
Key string - Region string
- Destination string
- Canned
Acl string - Enable
Encryption bool - Encryption
Type string - Endpoint string
- Kms
Key string - Region string
- destination String
- canned
Acl String - enable
Encryption Boolean - encryption
Type String - endpoint String
- kms
Key String - region String
- destination string
- canned
Acl string - enable
Encryption boolean - encryption
Type string - endpoint string
- kms
Key string - region string
- destination str
- canned_
acl str - enable_
encryption bool - encryption_
type str - endpoint str
- kms_
key str - region str
- destination String
- canned
Acl String - enable
Encryption Boolean - encryption
Type String - endpoint String
- kms
Key String - region String
ClusterInitScriptVolumes, ClusterInitScriptVolumesArgs
- Destination string
- Destination string
- destination String
- destination string
- destination str
- destination String
ClusterInitScriptWorkspace, ClusterInitScriptWorkspaceArgs
- Destination string
- Destination string
- destination String
- destination string
- destination str
- destination String
ClusterLibrary, ClusterLibraryArgs
- Cran
Cluster
Library Cran - Egg string
- Jar string
- Maven
Cluster
Library Maven - Pypi
Cluster
Library Pypi - Whl string
- Cran
Cluster
Library Cran - Egg string
- Jar string
- Maven
Cluster
Library Maven - Pypi
Cluster
Library Pypi - Whl string
- cran
Cluster
Library Cran - egg String
- jar String
- maven
Cluster
Library Maven - pypi
Cluster
Library Pypi - whl String
- cran
Cluster
Library Cran - egg string
- jar string
- maven
Cluster
Library Maven - pypi
Cluster
Library Pypi - whl string
- cran Property Map
- egg String
- jar String
- maven Property Map
- pypi Property Map
- whl String
ClusterLibraryCran, ClusterLibraryCranArgs
ClusterLibraryMaven, ClusterLibraryMavenArgs
- Coordinates string
- Exclusions List<string>
- Repo string
- Coordinates string
- Exclusions []string
- Repo string
- coordinates String
- exclusions List<String>
- repo String
- coordinates string
- exclusions string[]
- repo string
- coordinates str
- exclusions Sequence[str]
- repo str
- coordinates String
- exclusions List<String>
- repo String
ClusterLibraryPypi, ClusterLibraryPypiArgs
ClusterWorkloadType, ClusterWorkloadTypeArgs
ClusterWorkloadTypeClients, ClusterWorkloadTypeClientsArgs
Package Details
- Repository
- databricks pulumi/pulumi-databricks
- License
- Apache-2.0
- Notes
This Pulumi package is based on the
databricks
Terraform Provider.