Managed table overwrites existing location for delta but not for others

Dhruv Singla 105 Reputation points
2024-02-08T09:11:34.64+00:00

I am working on Azure Databricks, with Databricks Runtime version being - 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am facing the following issue. Suppose I have a view named v1 and a database f1_processed created from the following command.

CREATE DATABASE IF NOT EXISTS f1_processed
LOCATION "abfss://processed@formula1dl679student.dfs.core.windows.net/"

This is creating a database in the container named processed. Suppose I already have some folder named circuits in that container. If I run the following command to create a managed table in parquet format from a dataframe in that location using the command below.

circuits_final_df.write.mode("overwrite").format("parquet").saveAsTable("f1_processed.circuits")

It gives an error as follows

SparkRuntimeException: [LOCATION_ALREADY_EXISTS] Cannot name the managed table as 
`spark_catalog`.`f1_processed`.`circuits`, as its associated location 
'abfss://processed@formula1dl679student.dfs.core.windows.net/circuits' already exists. 
Please pick a different table name, or remove the existing location first. SQLSTATE: 42710

However, if I try the same thing in delta format, it runs fine. So the following code runs fine.

circuits_final_df.write.mode("overwrite").format("delta").saveAsTable("f1_processed.circuits")

Also, while creating this delta table, it doesn't remove any files from the folder. It just adds the new files. Since the result mixes the existing data and new data, it seems it is a bug and it should not happen. Any help is appreciated.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,212 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 26,101 Reputation points
    2024-02-08T19:37:36.1133333+00:00

    It is normal what you are getting. The Delta format operates by not directly deleting or replacing files within a folder when you use the overwrite option. Instead, Delta generates a new file containing the updated schema and data. This approach enables the possibility of reverting to previous versions of the Delta table. It seems you have a table stored in Delta format at the specified path. Overwriting it in Parquet format directly is not feasible. To proceed with creating a Parquet file, you first need to remove the existing folder in the ABFSS location. After deleting it, you'll be able to create a new Parquet file in that location.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.