ADF dataflow timeout at Kusto Sink

Zixuan Li 46 Reputation points Microsoft Employee
2023-07-07T22:02:12.4533333+00:00

Hi all, I have created an Azure Data Factory dataflow where the sink is an ADX Kusto table. The dataflow does three joins in parallel and then unions the join result and writes to the sink. There're about 33 million rows written to the sink. And I'm having the following timeout error:
Operation on target Copy TechProfileAPICombined Primary failed: {"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Sink 'TechProfileUserSubscriptionPrimary': Error in post request:Connect to ingest-kustocxpcoprod.westus.kusto.windows.net:443 [ingest-kustocxpcoprod.westus.kusto.windows.net/52.250.220.31] failed: Connection timed out (Connection timed out)","Details":"shaded.msdataflow.com.microsoft.azure.kusto.data.exceptions.DataClientException: Error in post request:Connect to ingest-kustocxpcoprod.westus.kusto.windows.net:443 [ingest-kustocxpcoprod.westus.kusto.windows.net/52.250.220.31] failed: Connection timed out (Connection timed out)

I'm not sure if it's because the join is taking too much time due to the big size of the table. I have verified the connection is good. The pipeline can succeed when the dataset is small. I have tried scale out the ADX instance, increase the Kusto query timeout limit, and increase the dataflow compute size (to 128(+16 Driver cores)) but none of the approach works.

MicrosoftTeams-image (13).pngMicrosoftTeams-image (14).png

Azure Data Explorer
Azure Data Explorer
An Azure data analytics service for real-time analysis on large volumes of data streaming from sources including applications, websites, and internet of things devices.
539 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,113 questions
0 comments No comments
{count} votes

Accepted answer
  1. Sander van de Velde | MVP 34,201 Reputation points MVP
    2023-07-07T22:15:29.5166667+00:00

    Hello @Zixuan Li,

    Azure Data Explorer ingests data using either a batching or streaming mechanism.

    Here, batching is used.

    According to the documentation, batches are limited to a maximum size of 6GB.

    The data size limit for a batch ingestion command is 6 GB.

    It is recommended to add some kind of splitting mechanism in your data flow to prevent the batches from becoming too large.


    If the response helped, do "Accept Answer". If it doesn't work, please let us know the progress. All community members with similar issues will benefit by doing so. Your contribution is highly appreciated.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.