Broadcast hash join

Question

Broadcast hash join

Shambhu Rai 1,411

Hi Expert, Can we use broadcast hash join for performance running ... Pls help me with example .. If any I. E. Or reshuffle

PRADEEPCHEEKATLA 91,866 Reputation points

2024-01-30T01:15:15.83+00:00

@Shambhu Rai - Just checking in to see if the below answer provided by @
Amira Bedhiafi helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

1 answer

Your answer

PRADEEPCHEEKATLA 91,866 Reputation points

2024-01-30T01:15:15.83+00:00

@Shambhu Rai - Just checking in to see if the below answer provided by @
Amira Bedhiafi helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

Amira Bedhiafi 42,936 MVP Volunteer Moderator

Broadcast hash join is a technique used in distributed computing environments particularly useful when one of the datasets in the join operation is significantly smaller than the other. Here is an example :

   # Example DataFrames
   largeDF = spark.read.format("...").load("...")
   smallDF = spark.read.format("...").load("...")

Here I proceeded with the Broadcast and Join :

   from pyspark.sql.functions import broadcast
   # Perform broadcast hash join
   joinedDF = largeDF.join(broadcast(smallDF), largeDF["key"] == smallDF["key"])

The smallDF is broadcasted to all nodes, and the join is performed on the column "key". The operation is executed when an action is called on joinedDF, like joinedDF.show() or joinedDF.write.save("..."). If the small dataset is not small enough for a broadcast hash join, or if there's a need for reshuffling, Spark's optimizer will typically handle this. However, you can also manually repartition your df if needed.

Shambhu Rai 1,411 Reputation points

2024-01-24T04:22:46.7766667+00:00

one example for reshuffle join pls
Amira Bedhiafi 42,936 Reputation points MVP Volunteer Moderator

2024-01-24T10:51:00.0166667+00:00

Check the example I provided
Shambhu Rai 1,411 Reputation points

2024-01-24T11:24:52.6833333+00:00

it is broadcast hash join .... can you give an example reshuffle please

Share via

Broadcast hash join

1 answer

Your answer