Share via

Azure Synapse Link for Dataverse - How to delete audit entries after a specified period of time

Umar Ali 20 Reputation points
2025-03-13T12:14:43.24+00:00

We have a Synapse Link for Dataverse that sends audit table data into Synapse.

When setting up the Synapse Link for Dataverse, Audit and Audit_Partitioned tables were automatically created. This setup works well, and all audit entries appear as expected.

We now have a requirement to delete all audit entries after a specified period, e.g., remove all audit entries older than six months from Synapse.

How can we achieve this? When examining the storage account, we see CSV audit files and partitioned Parquet files (partitioned by year). My understanding is that all data in the Audit and Audit_Partitioned tables come from the partitioned Parquet files, and the CSV audit files are temporary until they are incorporated into the partitioned Parquet files.

Can we run a DELETE statement on the Lake Database? If so, will this update the underlying files, or is that not possible from the Lake Database?

Is there any other straightforward way to do this?

Azure Synapse Analytics
Azure Synapse Analytics

An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.


Answer accepted by question author

  1. Chandra Boorla 15,475 Reputation points Microsoft External Staff Moderator
    2025-03-13T15:27:13.5633333+00:00

    @Umar Ali

    When working with Azure Synapse Link for Dataverse, the data from Dataverse Audit Logs is continuously streamed into a Lake Database in Azure Synapse Analytics. This data is stored in Azure Data Lake Storage (ADLS) as Parquet files (partitioned by year) and temporary CSV files.

    Since you’re using Synapse Link for Dataverse, the Audit and Audit_Partitioned tables in your Lake Database are populated from the Parquet files stored in ADLS. It’s important to note that Parquet files are append-only and immutable, so you cannot simply run a DELETE statement on the Lake Database. Instead, you need to delete the underlying Parquet files to remove old audit entries.

    Why You Can’t Delete Directly from the Lake Database

    The Audit and Audit_Partitioned tables in Synapse Link for Dataverse are just external references to the Parquet files in ADLS. You cannot use SQL DELETE statements on the Lake Database because the Parquet files are immutable; SQL queries only operate on metadata and do not delete the actual underlying files.

    Recommended Ways to Delete Old Audit Entries Since you cannot delete records directly, here are some effective methods to manage and delete old audit data:

    Lifecycle Management Policy (No-Code Approach)

    Best For - Fully automated, no-code solution to delete old audit files.

    How it Works - Azure Storage Lifecycle Management policies can automatically delete Parquet files after a certain retention period (e.g., six months).

    Pros - Completely automated, requires no manual intervention or complex setups.

    Cons - Lifecycle management operates based on file modification dates, not partition values, and deletes entire files, not specific rows.

    SQL Drop Partition (SQL-Based)

    Best For - If your Audit_Partitioned table is partitioned by a field like year (e.g., year=2023).

    How it Works - You can drop partitions using a DROP PARTITION SQL command. However, this will only remove the metadata from the Lake Database, not the actual Parquet files in ADLS.

    Pros - Simple SQL operation if partitioned by time. It's a fast metadata-only operation.

    Cons - You must manually manage the files in ADLS. Dropping a partition doesn't delete the Parquet files themselves.

    Azure Data Factory (ADF) Pipeline (Custom Approach)

    Best For - More controlled, scheduled deletion of old audit data.

    How it Works - You can create an ADF pipeline to identify and delete Parquet files older than a certain threshold. This can be scheduled to run periodically.

    Pros - Gives you more control over which files to delete and when.

    Cons - Requires ADF pipeline setup and can incur costs based on the number of pipeline runs.

    Conclusion and Recommendations

    Best Choice for Simplicity - Use Lifecycle Management if you just want to automatically delete files older than a certain period with minimal configuration.

    Best for Partitioned Tables - Use SQL DROP PARTITION if your audit table is partitioned, and you only need to clean up metadata (but you’ll still need to manually remove the Parquet files from ADLS).

    Best for Granular Control - Use ADF Pipelines if you need more control over the deletion process, such as selective removal of files or additional logic for deletion.

    This approach provides flexibility in managing your audit data based on your specific requirements.

    I hope this information helps. Please do let us know if you have any further queries.

    Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

    Thank you.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.