Azure Synapse Link for Dataverse - How to delete audit entries after a specified period of time

Question

Azure Synapse Link for Dataverse - How to delete audit entries after a specified period of time

Umar Ali 20

We have a Synapse Link for Dataverse that sends audit table data into Synapse.

When setting up the Synapse Link for Dataverse, Audit and Audit_Partitioned tables were automatically created. This setup works well, and all audit entries appear as expected.

We now have a requirement to delete all audit entries after a specified period, e.g., remove all audit entries older than six months from Synapse.

How can we achieve this? When examining the storage account, we see CSV audit files and partitioned Parquet files (partitioned by year). My understanding is that all data in the Audit and Audit_Partitioned tables come from the partitioned Parquet files, and the CSV audit files are temporary until they are incorporated into the partitioned Parquet files.

Can we run a DELETE statement on the Lake Database? If so, will this update the underlying files, or is that not possible from the Lake Database?

Is there any other straightforward way to do this?

Chandra Boorla 15,475 Reputation points Microsoft External Staff Moderator

2025-03-19T10:25:09.7366667+00:00

@Umar Ali

Just checking in to see if the below suggestion helped. If you found this response helpful, do click Accept Answer and Yes for was this answer helpful so that it may assist others with a similar issue. Let us know if you need any further assistance.

Answer accepted by question author

0 additional answers

Your answer

Chandra Boorla 15,475 Reputation points Microsoft External Staff Moderator

2025-03-19T10:25:09.7366667+00:00

@Umar Ali

Just checking in to see if the below suggestion helped. If you found this response helpful, do click Accept Answer and Yes for was this answer helpful so that it may assist others with a similar issue. Let us know if you need any further assistance.

Answer 1

Chandra Boorla 15,475 Microsoft External Staff Moderator

@Umar Ali

When working with Azure Synapse Link for Dataverse, the data from Dataverse Audit Logs is continuously streamed into a Lake Database in Azure Synapse Analytics. This data is stored in Azure Data Lake Storage (ADLS) as Parquet files (partitioned by year) and temporary CSV files.

Since you’re using Synapse Link for Dataverse, the Audit and Audit_Partitioned tables in your Lake Database are populated from the Parquet files stored in ADLS. It’s important to note that Parquet files are append-only and immutable, so you cannot simply run a DELETE statement on the Lake Database. Instead, you need to delete the underlying Parquet files to remove old audit entries.

Why You Can’t Delete Directly from the Lake Database

The Audit and Audit_Partitioned tables in Synapse Link for Dataverse are just external references to the Parquet files in ADLS. You cannot use SQL DELETE statements on the Lake Database because the Parquet files are immutable; SQL queries only operate on metadata and do not delete the actual underlying files.

Recommended Ways to Delete Old Audit Entries Since you cannot delete records directly, here are some effective methods to manage and delete old audit data:

Lifecycle Management Policy (No-Code Approach)

Best For - Fully automated, no-code solution to delete old audit files.

How it Works - Azure Storage Lifecycle Management policies can automatically delete Parquet files after a certain retention period (e.g., six months).

Pros - Completely automated, requires no manual intervention or complex setups.

Cons - Lifecycle management operates based on file modification dates, not partition values, and deletes entire files, not specific rows.

SQL Drop Partition (SQL-Based)

Best For - If your Audit_Partitioned table is partitioned by a field like year (e.g., year=2023).

How it Works - You can drop partitions using a DROP PARTITION SQL command. However, this will only remove the metadata from the Lake Database, not the actual Parquet files in ADLS.

Pros - Simple SQL operation if partitioned by time. It's a fast metadata-only operation.

Cons - You must manually manage the files in ADLS. Dropping a partition doesn't delete the Parquet files themselves.

Azure Data Factory (ADF) Pipeline (Custom Approach)

Best For - More controlled, scheduled deletion of old audit data.

How it Works - You can create an ADF pipeline to identify and delete Parquet files older than a certain threshold. This can be scheduled to run periodically.

Pros - Gives you more control over which files to delete and when.

Cons - Requires ADF pipeline setup and can incur costs based on the number of pipeline runs.

Conclusion and Recommendations

Best Choice for Simplicity - Use Lifecycle Management if you just want to automatically delete files older than a certain period with minimal configuration.

Best for Partitioned Tables - Use SQL DROP PARTITION if your audit table is partitioned, and you only need to clean up metadata (but you’ll still need to manually remove the Parquet files from ADLS).

Best for Granular Control - Use ADF Pipelines if you need more control over the deletion process, such as selective removal of files or additional logic for deletion.

This approach provides flexibility in managing your audit data based on your specific requirements.

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Thank you.

Umar Ali 20 Reputation points

2025-03-13T16:02:17.37+00:00

I've noticed that the Parquet files have recent created and modified timestamps. For example, in the 2025 folder, I see 61 snappy.parquet files, all of which were created this morning.

I assume that whatever process manages these files deletes the old Parquet files and generates new ones by merging both old and new data, even though the audit entries within them still span from January 1st to today.

Is your understanding that a new Parquet file gets added each day, and the existing ones remain unchanged?

See image below - these parquet files have recent created and modified dates.
Umar Ali 20 Reputation points

2025-03-13T16:03:41.05+00:00

@Chandra Boorla see my comment above
Chandra Boorla 15,475 Reputation points Microsoft External Staff Moderator

2025-03-13T18:12:09.37+00:00
@Umar Ali

Thank you for your observations regarding the Parquet files. Here is how these files are typically managed:

New File Creation - Parquet files are often created anew each day (or as new data arrives) to store fresh audit data. This is why you see recent "created" timestamps. The files themselves are designed to be immutable, meaning once a file is written, it doesn't change.

Immutable Files - Existing Parquet files remain unchanged after they are created. This immutability ensures the integrity of the data within each file. If you see recent "modified" timestamps, it usually reflects file system-level actions like the creation of new files or metadata changes, not alterations to the data itself.

Data Handling - While old and new data may be accessed together at the query level, the files themselves aren't merged. Each file retains the data it was originally created with.

Managing Old Files - If you need to manage or delete older files, this typically requires manual intervention or an automated process, such as using lifecycle management policies or data pipelines.

So, to summarize -

A new Parquet file gets created each day (or as new data arrives).

The existing Parquet files remain unchanged after they're created.

The recent "modified" timestamps likely reflect new file creation or minor metadata updates but not changes to the data inside the files.

If you are concerned about managing or deleting old Parquet files, you would need to manually handle that through solutions like Lifecycle Management Policies or an ADF pipeline, as the system doesn’t automatically delete older files.

For more details, please refer the similar threads:

Dataverse Synapselink parquet part updates

Synapselink Audit table source files

I hope this information helps.
Smaran Thoomu 35,045 Reputation points Microsoft External Staff Moderator

2025-03-17T08:38:54.7866667+00:00

@Umar Ali Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.
Chandra Boorla 15,475 Reputation points Microsoft External Staff Moderator

2025-03-18T07:08:56.32+00:00

@Umar Ali

Just checking in to see if the above suggestion helped. If you found this response helpful, do click Accept Answer and Yes for was this answer helpful so that it may assist others with a similar issue. Let us know if you need any further assistance.
Umar Ali 20 Reputation points

2025-03-18T08:45:12.6066667+00:00

@Chandra Boorla @Smaran Thoomu

If I wanted to clear all audit data in Synapse, could I simply delete the contents of the yearly partition folders? When new audit data is added via Synapse Link, would this automatically recreate the Parquet files?

For example, I’m considering deleting all audit data every six months, regardless of when it was created. I assume I can just delete all Parquet files in the partition folders to achieve this, and new ones would be generated once new audit data enters the lake and any background jobs run.
Chandra Boorla 15,475 Reputation points Microsoft External Staff Moderator

2025-03-18T09:52:37.37+00:00

@Umar Ali

Yes, you can delete all the Parquet files within the yearly partition folders to clear all audit data in Synapse. When new audit data is generated via Synapse Link for Dataverse, the system will automatically create new Parquet files as data flows into the lake.

If I wanted to clear all audit data in Synapse, could I simply delete the contents of the yearly partition folders?

Manual Deletion of Parquet Files - You can indeed delete the Parquet files within the yearly partition folders in your Azure Data Lake Storage (ADLS). This action will effectively remove all audit data stored in those files. Ensure you have the necessary permissions to access and delete files in your ADLS account.

When new audit data is added via Synapse Link, would this automatically recreate the Parquet files?

Recreation of Parquet Files - Once you delete the Parquet files, new ones will be generated as fresh audit data is streamed from Dataverse through Synapse Link. The Synapse Link will continue to function, capturing new audit entries and storing them in newly created Parquet files. This process is automatic, and you do not need to manually recreate any files or partitions. The system will handle the creation of new files as data arrives.

Considerations:

Data Loss - Be aware that deleting Parquet files will result in permanent data loss for any audit entries contained within those files. Ensure that this aligns with your data retention policies and compliance requirements.

Backup - If necessary, consider backing up the data before deletion in case you need to restore it later.

Automation - If you plan to perform this deletion regularly (e.g., every six months), you might want to automate the process using Azure Data Factory or Azure Functions to avoid manual intervention.

In summary, yes, you can delete all Parquet files in the yearly partition folders to remove audit data. New Parquet files will be automatically created as Synapse Link ingests new audit records. However, be mindful of potential temporary data gaps in queries and ensure you have a structured deletion approach.

I hope this information helps.

Thank you.
Umar Ali 20 Reputation points

2025-03-19T10:51:54.6833333+00:00

@Chandra Boorla

If we unlink the Dataverse Synapse Link and then later relink it, will all historical audit data from Dataverse be transferred to Synapse, or will only newly created audit records be synced?

For example, suppose Dataverse is configured to retain only six months' worth of audit entries. If we unlink the Synapse Link and then relink it a few minutes later, will the previous six months' worth of entries be transferred initially, or will only new audit records be synced?
Chandra Boorla 15,475 Reputation points Microsoft External Staff Moderator

2025-03-19T16:47:14.51+00:00

@Umar Ali

Thank you for the follow up question!

When you unlink and then relink Azure Synapse Link for Dataverse, the synchronization process starts over. Which means:

Historical Data Sync - Any audit data currently retained in Dataverse (based on its retention policy) will be eligible for synchronization.

Example Scenario - Since your Dataverse instance is configured to retain six months of audit entries, after relinking, the past six months’ worth of records will be transferred to Synapse.

Not Just New Records - The synchronization process does not limit itself to newly created audit records. Instead, it will sync all available audit data within the configured retention period.

Data Lake Storage Considerations

If the Data Lake Storage is cleared before relinking - A fresh synchronization will occur, capturing all data currently available in Dataverse.

If the Data Lake Storage is not cleared before relinking - There may be potential conflicts or duplicate records unless properly managed.

For more details, please refer: Create an Azure Synapse Link for Dataverse with your Azure Synapse Workspace

I hope this information helps.

Thank you.
Umar Ali 20 Reputation points

2025-03-20T11:06:29.32+00:00

@Chandra Boorla Thanks. When unlinking a Synapse Link, what gets deleted? Is it the entire Data Lake/Blob Storage account or just specific folders/files within it? Does the Synapse database also get deleted?

That's my understanding from this article:

https://learn.microsoft.com/en-us/power-apps/maker/data-platform/azure-synapse-link-synapse

If we have Power BI reports connected to the delta lake database, would these need to be reconfigured after unlinking and relinking?
Chandra Boorla 15,475 Reputation points Microsoft External Staff Moderator

2025-03-20T17:56:33.18+00:00

@Umar Ali

Thank you for the follow up question!

When unlinking a Synapse Link, what gets deleted? Is it the entire Data Lake/Blob Storage account or just specific folders/files within it?

Data Lake/Blob Storage Impact - When you unlink a Synapse Link, it typically affects only the specific folders and files related to the Dataverse data within the Azure Data Lake Storage (ADLS). The entire storage account is not deleted. Other unrelated data in the storage account remains intact.

Does the Synapse database also get deleted?

Synapse Database Impact - The Synapse database itself is not deleted. However, the specific tables, views, and external tables that were linked to the Dataverse data will no longer be accessible. If you decide to relink the Synapse Link, you will need to recreate these tables and resynchronize the data.

If we have Power BI reports connected to the delta lake database, would these need to be reconfigured after unlinking and relinking?

Power BI Reports Impact -

If Delta Lake storage paths change (e.g., due to container renaming or deletion), Power BI reports will need to be updated to reference the new paths.

If the structure remains the same, a dataset refresh is recommended to ensure the latest data is reflected.

For Direct Lake or DirectQuery mode, you may need to validate and reconnect data sources after relinking.

Recommended Best Practices:

Documenting the current setup, including Delta table paths and Power BI configurations, is wise before making changes.

After relinking, verify the new setup and update Power BI queries accordingly.

Testing changes in a non-production environment helps ensure that everything works as expected before updating production reports.

I hope this information helps.

Thank you.

Share via

Azure Synapse Link for Dataverse - How to delete audit entries after a specified period of time

0 additional answers

Your answer