delta file Vaccuum Immediately dataflow

Vignesh Rajendran 1 Reputation point
2025-02-19T17:04:39.2466667+00:00

Hi ,

I am using Azure data factory Data Flow source is parquet file sink is Delta file .

I wanted run the vacuum immediately keeping no history . do we have any option for that please .

Sink is Delta File in the Dataflow

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,655 questions
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 15,765 Reputation points Microsoft External Staff Moderator
    2025-02-19T19:20:29.1733333+00:00

    @Vignesh Rajendran

    Thank you for posting your query!

    In Azure Data Factory (ADF), when working with Delta Lake, you can manage the lifecycle of your Delta tables, including vacuuming, but there are some limitations when it comes to directly executing Delta Lake commands like VACUUM from within a Data Flow.

    Steps to Vacuum Delta Table

    1. Use a Databricks Notebook: You can create a Databricks notebook to run the vacuum command and then call this notebook from your ADF pipeline.
    2. Vacuum Command: In the Databricks notebook, you can use the following command to vacuum the Delta table and keep no history:
         spark.sql("VACUUM delta.`<path-to-delta-table>` RETAIN 0 HOURS")
      
    3. Integration with ADF: In your ADF pipeline, use the Databricks Notebook activity to call the notebook you created. This will ensure that the vacuum operation is executed immediately after your data flow completes.
    4. User's image

    Important Notes

    • The RETAIN 0 HOURS option in the VACUUM command will remove all files that are no longer referenced by the Delta table, effectively keeping no history. Be cautious with this option, as it will make it impossible to roll back to previous versions of the data.
    • Ensure that the ADF service has the necessary permissions to execute the notebook in Databricks.

    please refer:https://learn.microsoft.com/en-us/azure/data-factory/format-delta

    I hope this information helps. Please do let us know if you have any further queries.

    Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.