Broadcast join timeout error

T Venkat Anurag 20 Reputation points
2023-06-27T12:03:46.0033333+00:00

The pipelines are scheduled to refresh everyday and they used to refresh & execute successfully. But from past 2 days I am getting an error - "Broadcast join timeout error". The pipelines are taking more time than usual to execute and failing eventually.

Can anyone help me with this please.

Azure SQL Database
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,625 questions
0 comments No comments
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator
    2023-06-28T17:19:23.8233333+00:00

    Hi T Venkat Anurag ,

    Welcome to Microsoft Q&A platform and thanks for posting your question here.

    As I understand your issue, you are getting Broadcast timeout error in the mapping dataflow. Please let me know if that is not the correct understanding.

    If the size of the broadcasted data is too large for the Spark node, you may get an out of memory error. To avoid out of memory errors, use memory optimized clusters. If you experience broadcast timeouts during data flow executions, you can switch off the broadcast optimization. However, this will result in slower performing data flows.

    When working with data sources that can take longer to query, like large database queries, it is recommended to turn broadcast off for joins. Source with long query times can cause Spark timeouts when the cluster attempts to broadcast to compute nodes.

    Another good choice for turning off broadcast is when you have a stream in your data flow that is aggregating values for use in a lookup transformation later. This pattern can confuse the Spark optimizer and cause timeouts.User's image

    For more details, kindly check out the below documentation:

    Optimizing Joins, Exists, and Lookups via Broadcasting

    Hope it helps. Kindly accept the answer if your found it helpful. Thankyou


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.