delta file Vaccuum Immediately dataflow

Question

delta file Vaccuum Immediately dataflow

Vignesh Rajendran 1

Hi ,

I am using Azure data factory Data Flow source is parquet file sink is Delta file .

I wanted run the vacuum immediately keeping no history . do we have any option for that please .

Sink is Delta File in the Dataflow

phemanth 15,765 Reputation points Microsoft External Staff Moderator

2025-02-20T19:37:40.3666667+00:00

@Vignesh Rajendran We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
phemanth 15,765 Reputation points Microsoft External Staff Moderator

2025-02-21T19:54:53.6833333+00:00

@Vignesh Rajendran was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

1 answer

Your answer

phemanth 15,765 Reputation points Microsoft External Staff Moderator

2025-02-20T19:37:40.3666667+00:00

@Vignesh Rajendran We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
phemanth 15,765 Reputation points Microsoft External Staff Moderator

2025-02-21T19:54:53.6833333+00:00

@Vignesh Rajendran was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

@Vignesh Rajendran

Thank you for posting your query!

In Azure Data Factory (ADF), when working with Delta Lake, you can manage the lifecycle of your Delta tables, including vacuuming, but there are some limitations when it comes to directly executing Delta Lake commands like VACUUM from within a Data Flow.

Steps to Vacuum Delta Table

Use a Databricks Notebook: You can create a Databricks notebook to run the vacuum command and then call this notebook from your ADF pipeline.
Vacuum Command: In the Databricks notebook, you can use the following command to vacuum the Delta table and keep no history:
```
   spark.sql("VACUUM delta.`<path-to-delta-table>` RETAIN 0 HOURS")
```
Integration with ADF: In your ADF pipeline, use the Databricks Notebook activity to call the notebook you created. This will ensure that the vacuum operation is executed immediately after your data flow completes.

Important Notes

The RETAIN 0 HOURS option in the VACUUM command will remove all files that are no longer referenced by the Delta table, effectively keeping no history. Be cautious with this option, as it will make it impossible to roll back to previous versions of the data.
Ensure that the ADF service has the necessary permissions to execute the notebook in Databricks.

please refer:https://learn.microsoft.com/en-us/azure/data-factory/format-delta

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Share via

delta file Vaccuum Immediately dataflow

1 answer

Your answer