Set Spark configuration properties on Azure Databricks
You can set Spark configuration properties (Spark confs) to customize settings in your compute environment.
Databricks generally recommends against configuring most Spark properties. Especially when migrating from open-source Apache Spark or upgrading Databricks Runtime versions, legacy Spark configurations can override new default behaviors that optimize workloads.
For many behaviors controlled by Spark properties, Azure Databricks also provides options to either enable behavior at a table level or to configure custom behavior as part of a write operation. For example, schema evolution was previously controlled by a Spark property, but now has coverage in SQL, Python, and Scala. See Schema evolution syntax for merge.
Configure Spark properties for notebooks and jobs
You can set Spark properties for notebooks and jobs. The scope of the configuration depends on how you set it.
Properties configured: | Applies to: |
---|---|
Using compute configuration | All notebooks and jobs run with the compute resource. |
Within a notebook | Only the SparkSession for the current notebook. |
For instructions on configuring Spark properties at the compute level, see Spark configuration.
To set a Spark property in a notebook, use the following syntax:
SQL
SET spark.sql.ansi.enabled = true
Python
spark.conf.set("spark.sql.ansi.enabled", "true")
Scala
spark.conf.set("spark.sql.ansi.enabled", "true")
Configure Spark properties in Databricks SQL
Databricks SQL allows admins to configure Spark properties for data access in the workspace settings menu. See Enable data access configuration
Other than data access configurations, Databricks SQL only allows a handful of Spark confs, which have been aliased to shorter names for simplicity. See Configuration parameters.
For most supported SQL configurations, you can override the global behavior in your current session. The following example turns off ANSI mode:
SET ANSI_MODE = false
Configure Spark properties for Delta Live Tables pipelines
Delta Live Tables allows you to configure Spark properties for a pipeline, for one compute resource configured for a pipeline, or for individual flows, materialized views, or streaming tables.
You can set pipeline and compute Spark properties using the UI or JSON. See Configure a Delta Live Tables pipeline.
Use the spark_conf
option in DLT decorator functions to configure Spark properties for flows, views, or tables. See Python Delta Live Tables properties.
Configure Spark properties for serverless notebooks and jobs
Severless compute does not support setting most Spark properties for notebooks or jobs. The following are the properties you can configure:
spark.sql.legacy.timeParserPolicy
(Default value isEXCEPTION
)spark.sql.session.timeZone
(Default value isEtc/UTC
)spark.sql.shuffle.partitions
(Default value isauto
)spark.sql.ansi.enabled
(Default value istrue
)