Can you write multiple streaming queries(same schema, different input sources) into same Azure storage without overwriting?

Mayuri Kadam 81 Reputation points Microsoft Employee
2020-12-30T20:18:54.43+00:00

Hi,

I have to this requirement to write multiple streaming queries(same schema, different input sources) into same Azure blob delta lake gen 3 storage without overwriting. I need the data to co-exist in the same write directory, say like in 'append' mode where no data should get overwritten. I am using databricks auto-loader. I recently saw this error message: com.databricks.sql.transaction.tahoe.ProtocolChangedException: The protocol version of the Delta table has been changed by a concurrent update. Please try the operation again.

Any thoughts?

Thanks.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,483 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,919 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,217 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA 90,226 Reputation points
    2020-12-31T05:35:00.427+00:00

    Hello @Mayuri Kadam ,

    Welcome to the Microsoft Q&A platform.

    The ProtocolChangedError happens when a new table is being created in the same directory concurrently i.e., if multiple streams write output to the same delta location. A rerun for the same query should succeed and the subsequent run will not face that issue. In case, you are writing to particular partition in overwrite mode, please use the below spark conf -

    sparkSession.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")  
    

    Hope this helps. Do let us know if you any further queries.

    ------------

    • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification.
    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.