Identical history records for synapse link in parquet files

Xi Le 0 Reputation points
2024-05-25T03:29:54.09+00:00

Hello, I have an issue. When we create a synapse link with our d365 fo to link the dataverse and its tables to retrieve it in synapse something happens. The files are for some reasons duplicated with different dates but with the same size everytime that there's any changes. It's in append only mode. Incremental refresh at 15. By default in parquet files for the d365 fo (the Azure synapse analytic and spark pool are checked in the creation of the synapse link). I saw that for salesline, the duplication reset each monday. There's a 400kb file and a 80kb file. For example, tuesday there will be another 408kb file and another 82kb file. The 80kb and 82kb files contains the id x, but not the 400kb and 408kb files. When I query the data in synapse, I have litteraly duplicated rows. Identical history records, is it normal? If I query it in Fabric in the lakehouse with a select distinct, I can see that the rows are litteraly duplicated because everytime there's a sync, files are cumulated. Another example would be with the dataarea, this one doesn't change since the datas are not updated frequently but when we add the table to the dataverse (d365 fo) to the synapse link, then there's 4 parquet files at 9.3kb, same rows litterally, there's no new rows. I can't tell why it's doing that, what's the cause of this? How can I debug it? Btw, we used to use the export to datalake in the system administration of the d365 fo and in csv we never had this problem of duplications.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,584 questions
Dynamics 365 Training
Dynamics 365 Training
Dynamics 365: A Microsoft cloud-based business platform that provides customer relationship management and enterprise resource planning solutions.Training: Instruction to develop new skills.
64 questions
Microsoft Dataverse Training
Microsoft Dataverse Training
Microsoft Dataverse: A Microsoft service that enables secure storage and management of data used by business apps. Previously known as Common Data Service.Training: Instruction to develop new skills.
13 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Harishga 5,270 Reputation points Microsoft Vendor
    2024-05-27T09:49:17.3866667+00:00

    Hi @Xi Le
    Welcome to Microsoft Q&A platform and thanks for posting your question here.

    To address the issue of duplicated files and rows in Azure Synapse Analytics, here are some steps you can take:

    • Check the export settings: Ensure that the export to data lake feature is configured correctly. Azure Synapse Link for Dataverse should be set to save data in Parquet Delta Lake format by default, which can help with query response times.
    • Review data transactions: Verify that the Create, Update, and Delete (CUD) transactions are being handled correctly. Azure Synapse Link supports continuous replication of entity and table data, including CUD transactions.
    • Update versions: Make sure you have the latest cumulative updates for your finance and operations application versions. There might be fixes in the latest updates that address the duplication issue.
    • In-place updates vs. append-only writes: If you’re using in-place updates, consider switching to append-only writes to prevent duplicate records from being created.
    • Event-based triggers: Set the storage event to only the model. Json file, so every end of the Synapse link cycle will trigger your pipeline to run. This can help manage when data is exported and processed.
    • Pipeline creation: If necessary, create a pipeline using copy data activity to move your data without using shortcuts. This can help ensure that your D365 F&O tables are linked correctly, and that data is copied accurately.
    • Monitor file generation: Observe the pattern of file generation, especially around the reset period on Mondays for sales line. This can give you insights into the process that’s causing the duplication.

    Reference:
    https://learn.microsoft.com/en-us/power-apps/maker/data-platform/azure-synapse-link-select-fno-data

    https://d365hub.com/Posts/Details/9f4e78ff-a352-4cf6-9ba2-62139446cf06/fixed-duplicate-records-in-azure-synapse-link-dataverse

    https://stackoverflow.com/questions/77807929/azure-synapse-link-for-d365-dataverse-event-based-trigger-for-storage-change

    I hope this information helps you. Let me know if you have any further questions or concerns.

    0 comments No comments