How to configure big data clusters settings post deployment

Applies to: SQL Server 2019 (15.x)

Important

The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on February 28, 2025. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform and the software will continue to be maintained through SQL Server cumulative updates until that time. For more information, see the announcement blog post and Big data options on the Microsoft SQL Server platform.

Cluster, service, and resource scoped settings for SQL Server Big Data Clusters can be configured post-deployment through the azdata CLI. This functionality allows SQL Server Big Data Clusters administrators to adjust configurations to always meet workload requirements. This article goes over example scenarios on how to configure timezone and Spark workload requirements. The post-deployment configuration functionality follows a set, diff, apply flow.

Note

Post-deployment settings configuration is only available in SQL Server Big Data Clusters CU9 and later deployments. Settings configuration does not include scale, storage, or endpoint configuration. Options and instructions to configure SQL Server Big Data Clusters prior to CU9 can be found here.

Step by Step Scenario: Configure timezone on SQL Server Big Data Clusters

Starting on SQL Server Big Data Clusters CU13 it is possible to customize the cluster timezone configuration, so services timestamps align with the selected timezone. The setting does not apply to the big data cluster control plane, it sets the new timezone configuration for all SQL Server pools (master, compute, and data), Hadoop components, and Spark.

Note

By default, SQL Server Big Data Clusters sets UTC as the timezone.

Use the following command to set the timezone configuration:

azdata bdc settings set --settings bdc.timezone=America/Los_Angeles

Apply the pending settings to the cluster

The following command will apply the configuration and restart all services. Review the last sections of this article on how to track changes and control the configuration process.

azdata bdc settings apply

Step by Step Scenario: Configure the cluster to meet your Spark workload requirements

View the current configurations of the big data cluster Spark service

The following example shows how to view the user configured settings of the Spark service. You can view all possible configurable settings, system-managed and all configurable settings, and pending settings through optional parameters. Visit azdata bdc spark statement for more information.

azdata bdc spark settings show

Sample output

Spark Service

Setting Running Value
spark-defaults-conf.spark.driver.cores 1
spark-defaults-conf.spark.driver.memory 1664m

Change the default number of cores and memory for the Spark driver

Update the default number of cores to two and default memory to 7424 MB for the Spark service. This affects all resources with Spark, for the Spark service.

azdata bdc spark settings set --settings spark-defaults-conf.spark.driver.cores=2,spark-defaults-conf.spark.driver.memory=7424m

Change the default number of cores and memory for the Spark executors in the Storage Pool

Update the default number of executor cores to 4 for the Storage Pool.

azdata bdc spark settings set --settings spark-defaults-conf.spark.executor.cores=4 --resource=storage-0

Configure additional paths to the default classpath of Spark applications

The /opt/hadoop/share/hadoop/tools/lib/ path contains several libraries to be used by your spark applications, but the referred path is not loaded by default in the classpath of Spark applications. To enable this setting, apply the following configuration pattern.

azdata bdc hdfs settings set --settings hadoop-env.HADOOP_CLASSPATH="/opt/hadoop/share/hadoop/tools/lib/*"

View the pending settings changes staged in the big data cluster

View the pending settings changes for the Spark service only and across the entire big data cluster.

Pending Spark Service Settings

azdata bdc spark settings show --filter-option=pending --include-details

Spark Service

Setting Running Value Configured Value Configurable Configured Last Updated Time
spark-defaults-conf.spark.driver.cores 1 2 true true
spark-defaults-conf.spark.driver.memory 1664m 7424m true true

All Pending Settings

azdata bdc settings show --filter-option=pending --include-details --recursive

Spark Service Settings - Pending

Setting Running Value Configured Value Configurable Configured Last Updated Time
spark-defaults-conf.spark.driver.cores 1 2 true true
spark-defaults-conf.spark.driver.memory 1664m 7424m true true

Storage-0 Resource Spark Settings - Pending

Setting Running Value Configured Value Configurable Configured Last Updated Time
spark-defaults-conf.spark.executor.cores 1 4 true true

Apply the pending settings to the big data cluster

azdata bdc settings apply

Monitor the configuration update status

azdata bdc status show

Optional steps

Revert pending configuration settings

If you determine that you no longer want to change the pending configuration settings, you can un-stage these settings. This will revert the pending settings at all scopes.

azdata bdc settings revert

Abort the configuration upgrade

If the configuration upgrade fails for any of the components, you can cancel the upgrade process and return the cluster back to its prior configurations. Settings that were staged for change during the upgrade will again be listed as pending settings.

azdata bdc settings cancel-apply

Next steps

Configure a SQL Server Big Data Cluster