Synapse data flow Sink to delta table - vacuum not working as expected

Paul Hernandez 631 Reputation points Microsoft Employee
2021-09-02T14:46:19.35+00:00

Hi everyone,

we are writing to a delta table in a synapse data flow using an inline dataset with the following settings:

128648-image.png

With overwrite set as table action we added a new snapshot to the target table every day making a full load.

We left the default vacuum value of 0 which means 30 day.

The day after every run the files from the previous load are marked for remove in the logs:

{"remove":{"path":"part-00000-49dfde94-43b2-4444-a8b1-e683fb5552c5-c000.snappy.parquet","deletionTimestamp":1627905681270,"dataChange":true}}     

However, after a month files are not getting removed from the table location.

I analyzed which files should be removed using a dry run:

128755-image.png

There are around 4000 files to be deleted.

If I execute a vacuum in a notebook then the files are removed.

I would like to know why the vacuum from the data flow is not removing the data exceeding the threshold, am I missing something or is this a bug?

Any information will be appreciated.

Best regards,

Paul Hernandez

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,566 questions
{count} vote

Accepted answer
  1. Paul Hernandez 631 Reputation points Microsoft Employee
    2021-10-11T15:06:26.937+00:00

    Hi everyone, hi @PRADEEPCHEEKATLA-MSFT ,

    I have an interesting finding.

    We changed the value of the vacuum from 0 (which is suposed to be 30 days) to 720 (also the hours in 30 days) and it worked.

    It seems like the default value of "0" is not taking effect.

    The answer at the moment is to set the value you want in hours and avoid the default configuration.

    BR.
    Paul

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful