Salesforce connector : I am getting TaskCanceledException error while doing full load for Service__c and Service_Step__c entity.

Dhaval Uday Shah 20 Reputation points
2024-10-01T07:32:09.8266667+00:00

Hello,

We have recently transitioned to the Salesforce Bulk API 2.0 Connector in Azure Data Factory (ADF) for our integration processes. As part of our use case, we perform a full data load for all Salesforce entities once a month. However, we are encountering issues when attempting to load entities with data volumes exceeding 1 million records.

The following error is observed:


Failure happened on 'Source' side. ErrorCode=SalesforceAPITaskCancelException, 'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException, Message=Getting an unexpected TaskCanceledException when sending request to Salesforce API even after multiple retries!, Source=Microsoft.Connectors.Salesforce,' 'Type=System.Threading.Tasks.TaskCanceledException, Message=A task was canceled., Source=mscorlib,'

We have already tried extending the timeout to the default of 7 days, but the issue persists.

Upon researching, I came across suggestions to load the data in chunks to handle large datasets effectively. However, I am unsure how to implement this chunking mechanism in Azure Data Factory for Salesforce Bulk API 2.0.

Could you kindly provide guidance or suggestions on how to resolve this issue or successfully configure chunked data loading in ADF?

Thank you for your assistance.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,807 questions
{count} votes

Accepted answer
  1. Vinodh247 22,871 Reputation points
    2024-10-01T13:49:10.0166667+00:00

    Hi Dhaval Uday Shah,

    Thanks for reaching out to Microsoft Q&A.

    1. Determine Chunking Logic
    • You need to decide on the chunking logic. Common options include chunking based on:
      • Record count (ex:100k records per batch)
      • A range of IDs (by the primary key)
      • Time ranges (if your data has a CreatedDate or ModifiedDate field)
    1. Set Up a Parameterized Pipeline
    • Create a parameterized pipeline in ADF that handles each chunk of data. You can use a ForEach activity to iterate over the chunks.
    1. Modify Salesforce Source Query
    • In the source of your ADF pipeline, modify the Salesforce source query to fetch data in chunks. You can add a filter in the SOQL query based on the chunking logic (ex., WHERE Id BETWEEN @chunkStart AND @chunkEnd or WHERE CreatedDate BETWEEN @startDate AND @endDate).
    • You can use pipeline parameters to pass dynamic chunk values (ex:, chunkStart, chunkEnd, startDate, endDate) to the query.
    1. Create a Lookup Activity
    • Use a Lookup activity to first get the total number of records in the Salesforce object. This can help you calculate how to divide the data into chunks based on record count.
    1. Use a ForEach Activity
    • After retrieving the total number of records, configure a ForEach activity to loop through your chunks. Inside the ForEach loop, you can dynamically pass the chunk parameters to the source Salesforce connector query.
    1. Retry Mechanism
    • Ensure that the retry policy is configured for your ADF activities, so any temporary issues with the Salesforce API are retried automatically.

    Concurrency Mode: Use parallel mode to allow multiple batches to be processed simultaneously, which can speed up the process.

    Batch Size: The Bulk API 2.0 has a maximum batch size limit (example., 10k records per batch), so ensure that your chunk size and batch size are set appropriately to avoid overloading the API.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.