Troubleshoot pipeline orchestration and triggers in Azure Data Factory
APPLIES TO: Azure Data Factory Azure Synapse Analytics
Tip
Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!
A pipeline run in Azure Data Factory defines an instance of a pipeline execution. For example, let's say you have a pipeline that runs at 8:00 AM, 9:00 AM, and 10:00 AM. In this case, there are three separate pipeline runs. Each pipeline run has a unique pipeline run ID. A run ID is a globally unique identifier (GUID) that defines that particular pipeline run.
Pipeline runs are typically instantiated by passing arguments to parameters that you define in the pipeline. You can run a pipeline either manually or by using a trigger. See Pipeline execution and triggers in Azure Data Factory for details.
Common issues, causes, and solutions
An Azure Functions app pipeline throws an error with private endpoint connectivity
You have a data factory and a function app running on a private endpoint in Azure. You're trying to run a pipeline that interacts with the function app. You've tried three different methods, but one returns error "Bad Request," and the other two methods return "103 Error Forbidden."
Cause
Azure Data Factory currently doesn't support a private endpoint connector for function apps. Azure Functions rejects calls because it's configured to allow only connections from a private link.
Resolution
Create a PrivateLinkService endpoint and provide your function app's DNS.
A pipeline run is canceled but the monitor still shows progress status
Cause
When you cancel a pipeline run, pipeline monitoring often still shows the progress status. This happens because of a browser cache issue. You also might not have the correct monitoring filters.
Resolution
Refresh the browser and apply the correct monitoring filters.
You see a "DelimitedTextMoreColumnsThanDefined" error when copying a pipeline
Cause
If a folder you're copying contains files with different schemas, such as variable number of columns, different delimiters, quote char settings, or some data issue, the pipeline might throw this error:
Operation on target Copy_sks failed: Failure happened on 'Sink' side. ErrorCode=DelimitedTextMoreColumnsThanDefined, 'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException, Message=Error found when processing 'Csv/Tsv Format Text' source '0_2020_11_09_11_43_32.avro' with row number 53: found more columns than expected column count 27., Source=Microsoft.DataTransfer.Common,'
Resolution
Select the Binary Copy option while creating the Copy activity. This way, for bulk copies or migrating your data from one data lake to another, Data Factory won't open the files to read the schema. Instead, Azure Data Factory treats each file as binary and copies it to the other location.
A pipeline run fails when you reach the capacity limit of the integration runtime for data flow
Issue
Error message:
Type=Microsoft.DataTransfer.Execution.Core.ExecutionException,Message=There are substantial concurrent MappingDataflow executions which is causing failures due to throttling under Integration Runtime 'AutoResolveIntegrationRuntime'.
Cause
You've reached the integration runtime's capacity limit. You might be running a large amount of data flow by using the same integration runtime at the same time. See Azure subscription and service limits, quotas, and constraints for details.
Resolution
- Run your pipelines at different trigger times.
- Create a new integration runtime, and split your pipelines across multiple integration runtimes.
A pipeline run error while invoking REST API in a Web activity
Issue
Error message:
Operation on target Cancel failed: {“error”:{“code”:”AuthorizationFailed”,”message”:”The client ‘<client>’ with object id ‘<object>’ does not have authorization to perform action ‘Microsoft.DataFactory/factories/pipelineruns/cancel/action’ over scope ‘/subscriptions/<subscription>/resourceGroups/<resource group>/providers/Microsoft.DataFactory/factories/<data factory name>/pipelineruns/<pipeline run id>’ or the scope is invalid. If access was recently granted, please refresh your credentials.”}}
Cause
Pipelines can use the Web activity to call ADF REST API methods if and only if the Azure Data Factory member is assigned the Contributor role. You must first configure and add the Azure Data Factory managed identity to the Contributor security role.
Resolution
Before you use the Azure Data Factory’s REST API in a Web activity’s Settings tab, security must be configured. Azure Data Factory pipelines can use the Web activity to call ADF REST API methods if and only if the Azure Data Factory managed identity is assigned the Contributor role. Begin by opening the Azure portal and clicking the All resources link on the left menu. Select Azure Data Factory to add ADF managed identity with Contributor role by clicking the Add button in the Add a role assignment box.
How to check and branch on activity-level success and failure in pipelines
Cause
Azure Data Factory orchestration allows conditional logic and enables users to take different paths based upon the outcome of a previous activity. It allows four conditional paths: Upon Success (default pass), Upon Failure, Upon Completion, and Upon Skip.
Azure Data Factory evaluates the outcome of all leaf-level activities. Pipeline results are successful only if all leaves succeed. If a leaf activity was skipped, we evaluate its parent activity instead.
Resolution
- Implement activity-level checks by following How to handle pipeline failures and errors.
- Use Azure Logic Apps to monitor pipelines in regular intervals following Query By Factory.
- Visually Monitor Pipeline
How to monitor pipeline failures in regular intervals
Cause
You might need to monitor failed Azure Data Factory pipelines in intervals, say 5 minutes. You can query and filter the pipeline runs from a data factory by using the endpoint.
Resolution
- You can set up an Azure logic app to query all of the failed pipelines every 5 minutes, as described in Query By Factory. Then, you can report incidents to your ticketing system.
- You can rerun pipelines and activities as described here.
- You can rerun activities if you had canceled activity or had a failure as per Rerun from activity failures.
- Visually Monitor Pipeline
Degree of parallelism increase doesn't result in higher throughput
Cause
The degree of parallelism in ForEach is the max degree of parallelism. We can't guarantee a specific number of executions happening at the same time, but this parameter guarantees that we never go above the value that was set. You should see this as a limit, to be applied when controlling concurrent access to your sources and sinks.
Known Facts about ForEach
- Foreach has a property called batch count(n) where default value is 20 and the max is 50.
- The batch count, n, is used to construct n queues.
- Every queue runs sequentially, but you can have several queues running in parallel.
- The queues are precreated. This means there's no rebalancing of the queues during the runtime.
- At any time, you have at most one item being process per queue. This means at most n items being processed at any given time.
- The foreach total processing time is equal to the processing time of the longest queue. This means that the foreach activity depends on how the queues are constructed.
Resolution
- You shouldn't use SetVariable activity inside For Each that runs in parallel.
- Taking in consideration the way the queues are constructed, customer can improve the foreach performance by setting multiples of foreach where each foreach has items with similar processing time.
- This ensures that long runs are processed in parallel rather sequentially.
Pipeline status is queued or stuck for a long time
Cause
This can happen for various reasons like hitting concurrency limits, service outages, network failures and so on.
Resolution
Concurrency Limit: If your pipeline has a concurrency policy, verify that there are no old pipeline runs in progress.
Monitoring limits: Go to the authoring canvas, select your pipeline, and determine if it has a concurrency property assigned to it. If it does, go to the Monitoring view, and make sure there's nothing in the past 45 days that's in progress. If there's something in progress, you can cancel it and the new pipeline run should start.
Transient Issues: It's possible that your run was impacted by a transient network issue, credential failures, services outages etc. If this happens, Azure Data Factory has an internal recovery process that monitors all the runs and starts them when it notices something went wrong. You can rerun pipelines and activities as described here.. You can rerun activities if you had canceled activity or had a failure as per Rerun from activity failures. This process happens every one hour, so if your run is stuck for more than an hour, create a support case.
Longer start up times for activities in ADF Copy and Data Flow
Cause
This can happen if you haven't implemented time to live feature for Data Flow or optimized SHIR.
Resolution
- If each copy activity is taking up to 2 minutes to start, and the problem occurs primarily on a virtual network join (vs. Azure IR), this can be a copy performance issue. To review troubleshooting steps, go to Copy Performance Improvement.
- You can use time to live feature to decrease cluster start-up time for data flow activities. Review Data Flow Integration Runtime.
Hitting capacity issues in SHIR(Self-Hosted Integration Runtime)
Cause
This can happen if you haven't scaled up SHIR as per your workload.
Resolution
- If you encounter a capacity issue from SHIR, upgrade the VM to increase the node to balance the activities. If you receive an error message about a self-hosted IR general failure or error, a self-hosted IR upgrade, or self-hosted IR connectivity issues, which can generate a long queue, go to Troubleshoot self-hosted integration runtime.
Error messages due to long queues for ADF Copy and Data Flow
Cause
Long queue-related error messages can appear for various reasons.
Resolution
- If you receive an error message from any source or destination via connectors, which can generate a long queue, go to Connector Troubleshooting Guide.
- If you receive an error message about Mapping Data Flow, which can generate a long queue, go to Data Flows Troubleshooting Guide.
- If you receive an error message about other activities, such as Databricks, custom activities, or HDI, which can generate a long queue, go to Activity Troubleshooting Guide.
- If you receive an error message about running SSIS packages, which can generate a long queue, go to the Azure-SSIS Package Execution Troubleshooting Guide and Integration Runtime Management Troubleshooting Guide.
Error message - "code":"BadRequest", "message":"Null"
Cause
It's a user error because JSON payload that hits management.azure.com is corrupt. No logs are stored because user call didn't reach ADF service layer.
Resolution
Perform network tracing of your API call from ADF portal using Microsoft Edge/Chrome browser Developer tools. You'll see offending JSON payload, which could be due to a special character (for example, $
), spaces, and other types of user input. Once you fix the string expression, you'll proceed with rest of ADF usage calls in the browser.
ForEach activities don't run in parallel mode
Cause
You're running ADF in debug mode.
Resolution
Execute the pipeline in trigger mode.
Can't publish because account is locked
Cause
You made changes in collaboration branch to remove storage event trigger. You're trying to publish and encounter Trigger deactivation error
message.
Resolution
This is due to the storage account, used for the event trigger, is being locked. Unlock the account.
Expression builder fails to load
Cause
The expression builder can fail to load due to network or cache problems with the web browser.
Resolution
Upgrade the web browser to the latest version of a supported browser, clear cookies for the site, and refresh the page.
"Code":"BadRequest","message":"ErrorCode=FlowRunSizeLimitExceeded
Cause
You have chained many activities.
Resolution
You can split your pipelines into sub pipelines, and stich them together with ExecutePipeline activity.
How to optimize pipeline with mapping data flows to avoid internal server errors, concurrency errors etc. during execution
Cause
You have not optimized mapping data flow.
Resolution
- Use memory optimized compute when dealing with large amount of data and transformations.
- Reduce the batch size in case of a For Each activity.
- Scale up your databases and warehouses to match the performance of your ADF.
- Use a separate IR(integration runtime) for activities running in parallel.
- Adjust the partitions at the source and sink accordingly.
- Review Data Flow Optimizations
Error Code "BadRequest" when passing parameters to child pipelines
Cause
Failure type is user configuration issue. String of parameters, instead of Array, is passed to the child pipeline.
Resolution
Input execute pipeline activity for pipeline parameter as @createArray('a','b') for example, if you want to pass parameters 'a' and 'b'. If you want to pass numbers, for example, use @createArray(1,2,3). Use createArray function to force parameters being passed as an array.
Related content
For more troubleshooting help, try these resources: