Why Databricks cluster is creating more executors after all the job gets completed?

Manash 51 Reputation points
2023-12-07T14:05:15.3033333+00:00

I am running a pyspark scripts where nearly 13 CSV files are read from azure data lake, they are then joined one by one, and the final file is written into the same data-lake but on to a different container as a parquet file. I repeated this process several times and noticed in the SparkUI every time that more executors are getting created few minutes after my jobs gets completed. No one else is using my cluster and no more jobs have been triggered after writing the final file. This looks strange to me.

Why more executors are getting allocated after all jobs are completed?

Spark

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
{count} votes

Accepted answer
  1. Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator
    2023-12-21T22:44:00.4666667+00:00

    Hello Manash,

    When dynamic allocation enabled, spark will possibly acquire much more executors than expected. This means that your application may give resources back to the cluster if they are no longer used and request them again later when there is demand.

    My understanding is, If the additional executors were created but not used during the job execution, then you will not be charged for those executors.

    one another possible reason for the additional executors could be due to garbage collection. When Spark runs a job, it creates a number of objects in memory. These objects are managed by the Java Virtual Machine and are periodically cleaned up by the garbage collector. If the garbage collector is not able to keep up with the rate of object creation, it may cause the JVM to run out of memory. To prevent this, Spark may add additional executors to the pool to handle the increased workload.

    Please refer the below document:
    https://www.databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.