Copy Dataverse data into Azure SQL using Synapse Link

Murali Ramasamy 5

Im using Azure Synapse pipeline to copy data from Dataverse to Azure SQL DB. For large tables, it is only copying 10 million records and it is not inserting after. I have about 15 mil records.

I dont see any settings limitation anywhere on the pipeline. Anyone solution for this?

Thanks

Murali

phemanth 11,125 Reputation points Microsoft Vendor

2024-10-25T02:16:16.77+00:00

@Murali Ramasamy Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Murali Ramasamy 5 Reputation points

2024-10-25T17:06:34.53+00:00

It is getting weired too

If you see below, Staging.Interim table has 28 million records.

ProdJournalBom is 10 Mil at the end of the process update.

another weired thing is i saw this table has 13+ million records one point and i saw record count keep reduced and stopped at 10Mil.

InventDim has 55 million rows and i saw 55 mil in Staging table. More than 10 mil records are inserted to actual InventDim table but all the data wiped out at the end. I dont understand whats going on here.
phemanth 11,125 Reputation points Microsoft Vendor

2024-10-28T07:58:01.66+00:00

@Murali Ramasamy It looks like you’re facing some perplexing issues with your data transfer and processing in Azure Synapse. Here are a few potential explanations and troubleshooting steps

Data Loss During Transfer: If you’re seeing discrepancies in record counts (e.g., 28 million in the staging table but only 10 million in the production table), it could be due to data filtering or transformation logic in your pipeline. Check if there are any conditions or transformations applied during the copy process that might be excluding records.

Incremental Updates: If you’re using an incremental update strategy, ensure that the logic for identifying new or changed records is functioning correctly. Sometimes, if the logic is based on timestamps or flags, it might inadvertently skip records.

Concurrency Issues: If multiple processes are trying to write to the same table simultaneously, it could lead to race conditions where one process overwrites or deletes data written by another. Ensure that your pipeline is designed to handle concurrency appropriately.

Staging Table Management: Review how you’re managing the staging tables. If records are being deleted or truncated after the copy process, it could be due to post-copy scripts or cleanup activities that are not functioning as intended.

Monitoring and Logs: Utilize the monitoring features in Azure Synapse to track the execution of your pipeline. Look for any warnings or errors that might indicate why records are being lost or not transferred as expected. The execution details can help identify bottlenecks or failures in the process.

Data Movement Units (DMUs): If you’re using a low number of DMUs, it might affect the performance and reliability of data transfers. Consider increasing the DMUs to improve throughput.

Testing with Smaller Datasets: If possible, try running your pipeline with smaller datasets to see if the same issues occur. This can help isolate whether the problem is related to data volume or specific records.

If you continue to experience issues, providing specific error messages or logs can help in diagnosing the problem further
phemanth 11,125 Reputation points Microsoft Vendor

2024-10-29T04:40:00.1+00:00

@Murali Ramasamy We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
Murali Ramasamy 5 Reputation points

2024-10-29T15:48:04.66+00:00

Here is the update:

on the Copy and Upsert Data, I have increased the Query Timeout=1440 on Source and Write batch timeout to 5 days on Sink,Max concurrent connections to 16 .

I have deleted the table and delete the table records on log and orchestration Process Table but it is behaving same for couple of tables and it is random too(not specific to a particular table).

I have 8 tables more than 10mil and 4 tables looks good but other 4 has only exactly 10 mil

I dont see any scripts or filter which limits to 10mil.

Staging table is auto deleted after the process.

Im going to try 1 table at a time to see
Murali Ramasamy 5 Reputation points

2024-10-29T16:38:08.9133333+00:00

When i run on single table, i got the following error
ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed. Please search error to get more details.,Source=Microsoft.DataTransfer.Connectors.MSSQL,''Type=Microsoft.Data.SqlClient.SqlException,Message=The service has encountered an error processing your request. Please try again. Error code 701.\r\nA severe error occurred on the current command. The results, if any, should be discarded.,Source=Framework Microsoft SqlClient Data Provider
phemanth 11,125 Reputation points Microsoft Vendor

2024-10-30T05:20:06.17+00:00
@Balamurali Ramasamy

It looks like you’re dealing with a complex issue in your Azure Synapse pipeline, especially with the Error Code 701 you’re encountering. This error typically indicates that SQL Server has insufficient memory to execute the query, which can happen for several reasons.

Troubleshooting Steps

Reduce Batch Size: If you’re processing large batches of data, try reducing the batch size for your inserts or updates. Smaller batches can help manage memory usage more effectively.

Check Memory Settings:

Ensure that the max server memory setting in SQL Server is configured appropriately. If it’s set too low, SQL Server may not have enough memory to handle large operations.

Monitor the memory usage using tools like Performance Monitor or SQL Server’s dynamic management views (DMVs) to identify memory pressure.

Optimize Queries: Review the queries being executed during the data transfer. Ensure they are optimized and not unnecessarily complex, which can lead to higher memory consumption.

Monitor External Memory Pressure: Check if other applications on the server are consuming significant memory. If so, consider rescheduling or moving those applications to another server.

Use Staging Tables Wisely: Since your staging table is auto-deleted after the process, ensure that it’s not being truncated or deleted prematurely during the data transfer.

Run One Table at a Time: As you mentioned, testing one table at a time is a good approach. This can help isolate the issue and determine if it’s related to specific data or table configurations

1 answer

Amira Bedhiafi 25,946 Reputation points

2024-10-23T20:54:58.3566667+00:00

Although you mentioned not finding any settings limitations, it’s worth reviewing if there are any limits related to data transfers, particularly on the Dataverse side or within Synapse when dealing with large volumes of data.

Dataverse might have API throttling or batch size limitations. Check the Dataverse API limits, which could potentially restrict the number of records you can retrieve in a single operation.

You may need to verify that the batch size for your data copy activity is optimized. Sometimes, the default batch size may not be sufficient for large datasets. You can adjust the batch size in the settings of your Copy Activity within the Synapse pipeline.

For tables with more than 10 million records, consider using an incremental loading strategy. Instead of copying the entire table at once, you could break the data transfer into smaller chunks, either by partitioning the data or by using a date-based incremental load.

You can achieve this by configuring the Synapse pipeline to copy data based on specific conditions, such as a timestamp or an index column, in batches.

Review and increase the Data Movement Units (DMUs) in Azure Synapse. DMUs control the throughput of data movements between different services (like Dataverse to SQL). For larger data volumes, increasing DMUs could help in faster and more consistent data transfers.

Consider leveraging Dataflows or PolyBase if available. PolyBase allows you to load large datasets efficiently into Azure SQL from Azure Data Lake or other external sources.
Please sign in to rate this answer.
Murali Ramasamy 5 Reputation points

2024-10-25T16:07:58.8633333+00:00

Hi Amira

Thanks for the detailed help.
I increased the batch size but it didnt help. I increased data copy jobs time and it worked for 11 mil records on Incremental job.

Thing is my Inventdim is 55Mil and i see 55 mil data in Staging schema table but it is not moved to actual table. Job completes without any error.

Next, im going to try the full load instead of incremental then increase DMUs.

I will keep you posted.
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

Copy Dataverse data into Azure SQL using Synapse Link

1 answer

Your answer