Synapse copy activity sometimes does not complete

Anonymous
2022-06-15T07:50:26.44+00:00

We have created a pipeline in Synapse that reads JSON files from an Azure Data Lake and copies the data into a Synapse dedicated pool. The pipeline functions as expected, except that on occasion the copy activity that is used to transfer the data does not complete for larger files. We sometimes get initial loads from an external party which could be a single JSON file of 300+Mb. It seems to be hit and miss whether or not the copy activity completes. If it does, it runs in under a minute. If it doesn't, it will either fail with an Out of Memory exception (not the scope of this question) or it simply does not complete. I can't find any further details on the latter problem and I hope that someone here can help me understand what is going on. Below is a screenshot of the copy activity monitor:
211595-20220615-hangingcopytask-runid-78523b66-93c7-4eda.png

As you can see it's been running for 14 hours now and the throughput has dropped to almost nothing. Reading the file seems to be fine, but writing is not happening. The job is running on an AutoResolveIntegrationRuntime as we currently don't have any self-hosted runtimes, so I cannot play with the compute settings. This copy activity is one of many that were executed in parallel during the same run, and it's the only one that does not complete.
Hoping someone here can help me understand why it's not writing to the dedicated pool.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,379 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,652 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator
    2022-06-16T08:51:31.803+00:00

    Hi @Anonymous ,

    Thankyou for using Microsoft Q&A platform and thankyou for posting your query.

    As I understand your issue, you are trying to copy data from Azure data lake storage to Dedicated SQL pool but it is having performance issues for larger files. Please let me know if my understanding is incorrect.

    As can be seen in the screenshot, it's been suggested to use Polybase while copying data to Azure synapse

    Please refer to the following article for more details: Use PolyBase to load data into Azure Synapse Analytics and Staged copy by using PolyBase

    Using PolyBase is an efficient way to load a large amount of data into Azure Synapse Analytics with high throughput. You'll see a large gain in the throughput by using PolyBase instead of the default BULKINSERT mechanism.

    Also, you can consider increasing the DIUs and DOPs to have improved performance of copy activity.

    Kindly refer to this article for more details: Data Integration Units and Parallel copy

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.