Broadcast join
Today, regular joins are executed on a single cluster node. Broadcast join is an execution strategy of join that distributes the join over cluster nodes. This strategy is useful when the left side of the join is small (up to several tens of MBs). In this case, a broadcast join is more performant than a regular join.
Use the lookup operator if the right side is smaller than the left side. The lookup operator runs in broadcast strategy by default when the right side is smaller than the left.
Note
If the left side of the join is larger than several tens of MBs, the query will fail.
You can run the following query to estimate the size of the left side, in bytes:
leftSide
| summarize sum(estimate_data_size(*))
If left side of the join is a small dataset, then you may run join in broadcast mode using the following syntax (hint.strategy = broadcast):
leftSide
| join hint.strategy = broadcast (factTable) on key
The performance improvement is more noticeable in scenarios where the join is followed by other operators such as summarize
. See the following query for example:
leftSide
| join hint.strategy = broadcast (factTable) on Key
| summarize dcount(Messages) by Timestamp, Key
Related content
Feedback
https://aka.ms/ContentUserFeedback.
Kommer snart: I hele 2024 udfaser vi GitHub-problemer som feedbackmekanisme for indhold og erstatter det med et nyt feedbacksystem. Du kan få flere oplysninger under:Indsend og få vist feedback om