Salesforce connector : I am getting TaskCanceledException error while doing full load for Service__c and Service_Step__c entity.

Question

Salesforce connector : I am getting TaskCanceledException error while doing full load for Service__c and Service_Step__c entity.

Dhaval Uday Shah 20

Hello,

We have recently transitioned to the Salesforce Bulk API 2.0 Connector in Azure Data Factory (ADF) for our integration processes. As part of our use case, we perform a full data load for all Salesforce entities once a month. However, we are encountering issues when attempting to load entities with data volumes exceeding 1 million records.

The following error is observed:


Failure happened on 'Source' side. ErrorCode=SalesforceAPITaskCancelException, 'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException, Message=Getting an unexpected TaskCanceledException when sending request to Salesforce API even after multiple retries!, Source=Microsoft.Connectors.Salesforce,' 'Type=System.Threading.Tasks.TaskCanceledException, Message=A task was canceled., Source=mscorlib,'

We have already tried extending the timeout to the default of 7 days, but the issue persists.

Upon researching, I came across suggestions to load the data in chunks to handle large datasets effectively. However, I am unsure how to implement this chunking mechanism in Azure Data Factory for Salesforce Bulk API 2.0.

Could you kindly provide guidance or suggestions on how to resolve this issue or successfully configure chunked data loading in ADF?

Thank you for your assistance.

Smaran Thoomu 24,575 Reputation points Microsoft External Staff Moderator

2024-10-03T16:14:56.55+00:00

@Dhaval Uday Shah Just checking in to see if the below answer provided by @Vinodh247 helped.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Accepted answer

0 additional answers

Your answer

Smaran Thoomu 24,575 Reputation points Microsoft External Staff Moderator

2024-10-03T16:14:56.55+00:00

@Dhaval Uday Shah Just checking in to see if the below answer provided by @Vinodh247 helped.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

Vinodh247 34,741 MVP Volunteer Moderator

Hi Dhaval Uday Shah,

Thanks for reaching out to Microsoft Q&A.

Determine Chunking Logic

You need to decide on the chunking logic. Common options include chunking based on:
- Record count (ex:100k records per batch)
- A range of IDs (by the primary key)
- Time ranges (if your data has a CreatedDate or ModifiedDate field)

Set Up a Parameterized Pipeline

Create a parameterized pipeline in ADF that handles each chunk of data. You can use a ForEach activity to iterate over the chunks.

Modify Salesforce Source Query

In the source of your ADF pipeline, modify the Salesforce source query to fetch data in chunks. You can add a filter in the SOQL query based on the chunking logic (ex., WHERE Id BETWEEN @chunkStart AND @chunkEnd or WHERE CreatedDate BETWEEN @startDate AND @endDate).
You can use pipeline parameters to pass dynamic chunk values (ex:, chunkStart, chunkEnd, startDate, endDate) to the query.

Create a Lookup Activity

Use a Lookup activity to first get the total number of records in the Salesforce object. This can help you calculate how to divide the data into chunks based on record count.

Use a ForEach Activity

After retrieving the total number of records, configure a ForEach activity to loop through your chunks. Inside the ForEach loop, you can dynamically pass the chunk parameters to the source Salesforce connector query.

Retry Mechanism

Ensure that the retry policy is configured for your ADF activities, so any temporary issues with the Salesforce API are retried automatically.

Concurrency Mode: Use parallel mode to allow multiple batches to be processed simultaneously, which can speed up the process.

Batch Size: The Bulk API 2.0 has a maximum batch size limit (example., 10k records per batch), so ensure that your chunk size and batch size are set appropriately to avoid overloading the API.

Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

Dhaval Uday Shah 20 Reputation points

2024-10-01T22:18:03.5+00:00

@Vinodh247 Is there an alternative method to perform the full load? Since my other entities are processed within the same for loop, I would prefer not to alter the entire logic at this stage. Instead of implementing chunking, is it possible to adjust the settings on the connector side or increase parallelism specifically when handling the service entity?
Vinodh247 34,741 Reputation points MVP Volunteer Moderator

2024-10-02T12:02:31.69+00:00
You can try these:

Increase Parallelism in Salesforce Bulk API Settings

The Salesforce Bulk API 2.0 supports parallel processing, which allows multiple batches to be processed concurrently. You can adjust the concurrency mode for the Salesforce connector in Azure Data Factory to Parallel mode if it’s not already set.

In ADF, when configuring the Salesforce connector, ensure that the Parallel Mode is enabled in the advanced settings. This can help increase throughput and minimize the impact of large datasets.

Adjust Batch Size

Salesforce Bulk API 2.0 supports a default batch size of 10k records, but you can adjust this according to your dataset. You can try reducing the batch size if the API is getting overwhelmed or timing out.

You can configure the batch size in the Copy Activity's source settings under the Salesforce connector's Advanced settings.

Adjust Retry and Timeout Settings

While you’ve already tried extending the timeout, you can also fine-tune the retry settings in the connector. Ensure that you retries are set to a higher value so that ADF automatically retries API calls in case of failures.

Adjust the Retry Interval and Maximum Retry settings in the connector to ensure that any temporary issues with Salesforce are handled without immediate failure.

Throttle Salesforce API Calls

In few cases, issues like TaskCanceledException can occur if the Salesforce API gets overwhelmed with too many simultaneous requests. You might want to throttle the requests by adjusting the Concurrency settings in the ADF connector or limiting the number of parallel operations within the ADF pipeline. While increasing parallelism can help, controlling the number of parallel API calls to a more stable limit can also avoid overloading

Monitor and Optimize Throughput

Monitor the throughput and DTU/IR (Integration Runtime) performance in ADF. Increasing the Data Integration Unit (DIU) count or scaling up the IR can provide additional processing power, improving the efficiency of data transfers.

Salesforce Bulk Query Optimization

If your Salesforce entity contains complex relationships or large amounts of data, simplify the query where possible to improve response times. You can also consider optimizing the index structure in Salesforce for large tables or adjusting API limits on the Salesforce side to handle larger loads more effectively.

Share via

Salesforce connector : I am getting TaskCanceledException error while doing full load for Service__c and Service_Step__c entity.

0 additional answers

Your answer