How to configure big data clusters settings post deployment
Applies to: SQL Server 2019 (15.x)
Important
The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on February 28, 2025. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform and the software will continue to be maintained through SQL Server cumulative updates until that time. For more information, see the announcement blog post and Big data options on the Microsoft SQL Server platform.
Cluster, service, and resource scoped settings for SQL Server Big Data Clusters can be configured post-deployment through the azdata
CLI. This functionality allows SQL Server Big Data Clusters administrators to adjust configurations to always meet workload requirements. This article goes over example scenarios on how to configure timezone and Spark workload requirements. The post-deployment configuration functionality follows a set, diff, apply flow.
Note
Post-deployment settings configuration is only available in SQL Server Big Data Clusters CU9 and later deployments. Settings configuration does not include scale, storage, or endpoint configuration. Options and instructions to configure SQL Server Big Data Clusters prior to CU9 can be found here.
Step by Step Scenario: Configure timezone on SQL Server Big Data Clusters
Starting on SQL Server Big Data Clusters CU13 it is possible to customize the cluster timezone configuration, so services timestamps align with the selected timezone. The setting does not apply to the big data cluster control plane, it sets the new timezone configuration for all SQL Server pools (master, compute, and data), Hadoop components, and Spark.
Note
By default, SQL Server Big Data Clusters sets UTC as the timezone.
Use the following command to set the timezone configuration:
azdata bdc settings set --settings bdc.timezone=America/Los_Angeles
Apply the pending settings to the cluster
The following command will apply the configuration and restart all services. Review the last sections of this article on how to track changes and control the configuration process.
azdata bdc settings apply
Step by Step Scenario: Configure the cluster to meet your Spark workload requirements
View the current configurations of the big data cluster Spark service
The following example shows how to view the user configured settings of the Spark service. You can view all possible configurable settings, system-managed and all configurable settings, and pending settings through optional parameters. Visit azdata bdc spark
statement for more information.
azdata bdc spark settings show
Sample output
Spark Service
Setting | Running Value |
---|---|
spark-defaults-conf.spark.driver.cores |
1 |
spark-defaults-conf.spark.driver.memory |
1664m |
Change the default number of cores and memory for the Spark driver
Update the default number of cores to two and default memory to 7424 MB for the Spark service. This affects all resources with Spark, for the Spark service.
azdata bdc spark settings set --settings spark-defaults-conf.spark.driver.cores=2,spark-defaults-conf.spark.driver.memory=7424m
Change the default number of cores and memory for the Spark executors in the Storage Pool
Update the default number of executor cores to 4 for the Storage Pool.
azdata bdc spark settings set --settings spark-defaults-conf.spark.executor.cores=4 --resource=storage-0
Configure additional paths to the default classpath of Spark applications
The /opt/hadoop/share/hadoop/tools/lib/
path contains several libraries to be used by your spark applications, but the referred path is not loaded by default in the classpath of Spark applications. To enable this setting, apply the following configuration pattern.
azdata bdc hdfs settings set --settings hadoop-env.HADOOP_CLASSPATH="/opt/hadoop/share/hadoop/tools/lib/*"
View the pending settings changes staged in the big data cluster
View the pending settings changes for the Spark service only and across the entire big data cluster.
Pending Spark Service Settings
azdata bdc spark settings show --filter-option=pending --include-details
Spark Service
Setting | Running Value | Configured Value | Configurable | Configured | Last Updated Time |
---|---|---|---|---|---|
spark-defaults-conf.spark.driver.cores |
1 |
2 |
true |
true |
|
spark-defaults-conf.spark.driver.memory |
1664m |
7424m |
true |
true |
All Pending Settings
azdata bdc settings show --filter-option=pending --include-details --recursive
Spark Service Settings - Pending
Setting | Running Value | Configured Value | Configurable | Configured | Last Updated Time |
---|---|---|---|---|---|
spark-defaults-conf.spark.driver.cores |
1 |
2 |
true |
true |
|
spark-defaults-conf.spark.driver.memory |
1664m |
7424m |
true |
true |
Storage-0 Resource Spark Settings - Pending
Setting | Running Value | Configured Value | Configurable | Configured | Last Updated Time |
---|---|---|---|---|---|
spark-defaults-conf.spark.executor.cores |
1 |
4 |
true |
true |
Apply the pending settings to the big data cluster
azdata bdc settings apply
Monitor the configuration update status
azdata bdc status show
Optional steps
Revert pending configuration settings
If you determine that you no longer want to change the pending configuration settings, you can un-stage these settings. This will revert the pending settings at all scopes.
azdata bdc settings revert
Abort the configuration upgrade
If the configuration upgrade fails for any of the components, you can cancel the upgrade process and return the cluster back to its prior configurations. Settings that were staged for change during the upgrade will again be listed as pending settings.
azdata bdc settings cancel-apply