Hi @Raj D ,
Sorry you are experiencing this and thanks for reaching out in Microsoft Q&A forum.
This problem could be due to a change in the default behavior of Spark version 2.4 (In Databricks Runtime 5.0 and above).
This problem can occur if:
- The cluster is terminated while a write operation is in progress.
- A temporary network issue occurs.
- The job is interrupted.
Once the metastore data for a particular table is corrupted, it is hard to recover except by dropping the files in that location manually. Basically, the problem is that a metadata directory called _STARTED isn’t deleted automatically when Azure Databricks tries to overwrite it.
Recommended Solution:
Please try setting the flag "spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation" to "true". This flag deletes the _STARTED directory and returns the process to the original state. For example, you can set it in the notebook as shown below:
spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")
Or you can also try setting it at cluster level Spark configuration:
spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation true
Another option is to manually clean up the data directory specified in the error message. You can do this with "dbutils.fs.rm".
dbutils.fs.rm("<path-to-directory>", True)
Please refer to this documentation which address this issue: Create table in overwrite mode fails when interrupted
Hope this info helps. Let us know how it goes.
Thank you
----------
Please do consider to click on "Accept Answer" and "Upvote" on the post that helps you, as it can be beneficial to other community members.