Synapse pipeline Ingest oracle table with XML type column into ADLS Gen2

Jane Zhang 25 Reputation points
2023-11-10T02:34:24.1666667+00:00

I am trying to ingest an oracle table (on premise) with XMLTYPE column into ADLS Gen2 with synapse pipeline. The total table size is 32G, more than 31G is consumed by XMLTYPE column.

If just ingesting less than 100 records, it works fine, and done in 17s. However, if more than 100, it took a long time. Each run will fail after 1 hour and 41 mins with the following error message.

    "errorCode": "1000",
    "message": "ErrorCode=SystemErrorActivityRunExecutedMoreThanOnce,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The activity run failed due to service internal error, please retry this activity run later.,Source=Microsoft.DataTransfer.TransferTask,'",
    "failureType": "SystemError",
    "target": "Copy_5g8",
    "details": []

User's image

Is there any way to fix the issue without scale out self-hosted IR?

Thanks!

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} votes

Accepted answer
  1. ShaikMaheer-MSFT 38,546 Reputation points Microsoft Employee Moderator
    2023-11-10T05:18:01.45+00:00

    Hi Jane Zhang,

    Thank you for posting query in Microsoft Q&A Platform.

    Based on the error message you provided, it seems that the activity run failed due to a service internal error. This error message suggests that the issue is not related to the size of the data you are ingesting, but rather a problem with the service itself.

    However, to answer your question, if you are trying to ingest a large amount of data, it is recommended to use a self-hosted integration runtime with more resources. This can help to improve the performance and avoid issues like the one you are experiencing.

    If you do not want to scale out your self-hosted IR, you can try optimizing your pipeline to improve its performance. Here are some suggestions:

    Use the PolyBase feature in Azure Synapse Analytics to load data from Oracle to ADLS Gen2. PolyBase can provide a faster and more efficient way to load data from Oracle to ADLS Gen2.

    Use partitioning to split the data into smaller chunks and load them in parallel. This can help to improve the performance of your pipeline.

    I hope this helps! Let me know if you have any other questions.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.