Why does synapse spark pool require databricks configuration?

Okan Mucahit Alaftekin 21 Reputation points
2022-08-24T13:19:22.633+00:00

In order to run VACUUM Delta_Robots RETAIN 0 HOURS DRY RUN sql command in Synapse notebooks, the setting SET spark.databricks.delta.retentionDurationCheck.enabled = false;
has to be set. Why databricks? The workspace doesn't have anything to do with Databricks.

This is rather a change/development request for the product team.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,359 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,915 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 76,836 Reputation points Microsoft Employee
    2022-08-25T09:08:36.677+00:00

    Hello @Okan Mucahit Alaftekin ,

    Thanks for the question and using MS Q&A platform.

    UPDATE: Databricks contributed their libraries into the open source and has the “Databricks” name in the namespace of the libraries. Changing the name of the namespace to remove ‘databricks’ would diverge away from the documentation for Delta.io OSS.

    Delta Lake has a safety check to prevent you from running a dangerous VACUUM command. If you are certain that there are no operations being performed on this table that take longer than the retention interval you plan to specify, you can turn off this safety check by setting the Spark configuration property spark.databricks.delta.retentionDurationCheck.enabled to false.

    Note: If you do set spark.databricks.delta.retentionDurationCheck.enabled to false in your Spark config, you must choose an interval that is longer than the longest running concurrent transaction and the longest period that any stream can lag behind the most recent update to the table.

    Vacuuming to work in a Synapse pyspark notebook with this code:

    SET spark.databricks.delta.retentionDurationCheck.enabled = false;  
    VACUUM Delta_Robots RETAIN 0 HOURS DRY RUN   
    

    For more details, refer to Vaccuming with zero retention results in data loss and Vacuum a Delta table (Delta Lake on Azure Databricks).

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful