Non-equality lookups should have broadcasted the right side in Azure data Factory

Monica Manoharan 66 Reputation points
2022-04-19T11:34:42.277+00:00

Hello Team,

I am having a data flow with a conditional split. In the Non availability branch, I have another lookup and a conditional split. On implementing this, the pipeline throws an error "Non-equality lookups should have broadcasted the right side". So, I tried changing the Broadcast option of the Lookup from Auto to Fixed and projected the right side. It resulted in a different error mentioned below.

Operation on target Final Fact Load EUS failed: {"StatusCode":"DF-Executor-BroadcastFailure","Message":"Job failed due to reason: Dataflow execution failed during broadcast exchange. Potential causes include misconfigured connections at sources or a broadcast join timeout error. To ensure the sources are configured correctly, please test the connection or run a source data preview in a Dataflow debug session. To avoid the broadcast join timeout, you can choose the 'Off' broadcast option in the Join/Exists/Lookup transformations. If you intend to use the broadcast option to improve performance then make sure broadcast streams can produce data within 60 secs for debug runs and within 300 secs for job runs. If problem persists, contact customer support.","Details":"org.apache.spark.SparkException: Exception thrown in Future.get: \n\tat org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:195)\n\tat org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:167)\n\tat org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:155)\n\tat org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$5.apply(SparkPlan.scala:187)\n\tat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n\tat org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:183)\n\tat org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:155)\n\tat org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.doExecute(BroadcastNestedLoopJoinExec.scala:357)\n\tat org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:146)\n\tat org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:134)"}

Can you please suggest the solution for this issue ? The highlighted are the Looks ups in the non-equality branch.

194312-image.png

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,657 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator
    2022-04-20T22:12:37.017+00:00

    Hello @Monica Manoharan ,

    Thanks for the question and using MS Q&A platform.

    The first error happened as Non-equi joins require at least one of the two streams to be broadcasted using Fixed broadcasting in the Optimize tab.

    For the second error message, it seems like something wrong with data source configurations. Please refer to this troubleshooting guide which has possible root case and recommended resolutions - Error code: DF-Executor-BroadcastFailure

    194847-image.png

    Please verify your configurations as recommend in this troubleshooting guide and let us know how it goes.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.