spark sql saveAsTable overwrite issue

Raj D 616 Reputation points
2020-09-10T22:12:31.79+00:00

I am using the below code to create a table from a dataframe in databricks and run into error.

df.write.saveAsTable("newtable")

This works fine the very first time but for re-usability if I were to rewrite like below the multiple variants throw the same error and this was all working as expected previously.

df.write.mode(SaveMode.Overwrite).saveAsTable("newtable")

or

df.write.mode("overwrite").saveAsTable("newtable")

I get the following error.

Error Message:

org.apache.spark.sql.AnalysisException: Can not create the managed table newtable. The associated location dbfs:/user/hive/warehouse/newtable already exists

Thank you in advance.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator
    2020-09-11T00:35:16.673+00:00

    Hi @Raj D ,

    Sorry you are experiencing this and thanks for reaching out in Microsoft Q&A forum.

    This problem could be due to a change in the default behavior of Spark version 2.4 (In Databricks Runtime 5.0 and above).

    This problem can occur if:

    1. The cluster is terminated while a write operation is in progress.
    2. A temporary network issue occurs.
    3. The job is interrupted.

    Once the metastore data for a particular table is corrupted, it is hard to recover except by dropping the files in that location manually. Basically, the problem is that a metadata directory called _STARTED isn’t deleted automatically when Azure Databricks tries to overwrite it.

    Recommended Solution:

    Please try setting the flag "spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation" to "true". This flag deletes the _STARTED directory and returns the process to the original state. For example, you can set it in the notebook as shown below:

    spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")  
    

    Or you can also try setting it at cluster level Spark configuration:

    spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation true  
    

    Another option is to manually clean up the data directory specified in the error message. You can do this with "dbutils.fs.rm".

    dbutils.fs.rm("<path-to-directory>", True)  
    

    Please refer to this documentation which address this issue: Create table in overwrite mode fails when interrupted

    Hope this info helps. Let us know how it goes.

    Thank you

    ----------

    Please do consider to click on "Accept Answer" and "Upvote" on the post that helps you, as it can be beneficial to other community members.

    1 person found this answer helpful.

  2. Johann Brinez Correa 0 Reputation points
    2023-02-17T13:45:21.7+00:00

    It happened to me when trying to overwrite a table with different data types, the table already existed but I was trying to overwrite it using a different table creation method. It seemed to me that the first method used to create the table was created with certain column data types, but then, when overwriting the table with the other method, it defined another column data types for the table. Short answer: validate column data types when overwriting.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.