Why can't set spark.cleaner.referenceTracking.cleanCheckpoints in Databricks

Ping Xiao 21 Reputation points
2021-07-05T16:11:44.56+00:00

I am using datafame.checkpoint(). I'd like to set spark.cleaner.referenceTracking.cleanCheckpoints to true. When I use spark.conf.set('spark.cleaner.referenceTracking.cleanCheckpoints', 'true'), I got "Cannot modify the value of a Spark config: spark.cleaner.referenceTracking.cleanCheckpoints". Why is it so? What's the default value for spark.cleaner.referenceTracking.cleanCheckpoints in Databricks?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,177 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 89,471 Reputation points Microsoft Employee
    2021-07-06T08:49:50.497+00:00

    Hello @Ping Xiao ,

    Welcome to the Microsoft Q&A platform.

    Note: Some of the spark properties that should be setup on cluster start.

    By default, spark.cleaner.referenceTracking.cleanCheckpoints is set to false.

    If you want to set spark.cleaner.referenceTracking.cleanCheckpoints is set to true. you should set it on the Spark Config under Advanced Options in the cluster configuration.

    112161-image.png

    Before and after configuring the Spark configuration under Advanced options in the cluster:

    112125-image.png

    Hope this helps. Do let us know if you any further queries.

    ---------------------------------------------------------------------------

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.