Hello @alex feng ,
Thanks for the question and using MS Q&A platform.
You cannot pass SQL join statement into the
spark.read.synapsesql
.
You can try the below to join two data frames as shown below.
For the demo purpose, I had created two tables in the Azure Synapse Dedicated pool named chepra
and the tables employee
which contains (EMP_ID and EMP_NAME) and employeedetails
which contains (EMP_ID, EMP_EMAIL and EMP_CONTACT) as shown below.
You can use any one of the below methods to join two data frames:
Method1:
%%spark
val empdf = spark.read.sqlanalytics("chepra.dbo.employee")
val empdetailsdf = spark.read.sqlanalytics("chepra.dbo.employeedetails")
val join = empdf.join(empdetailsdf, Seq("EMP_ID"), "inner")
join.show
Method2:
%%spark
val empdf = spark.read.sqlanalytics("chepra.dbo.employee")
val empdetailsdf = spark.read.sqlanalytics("chepra.dbo.employeedetails")
empdf.join(empdetailsdf,empdf("EMP_ID") === empdetailsdf("EMP_ID"),"inner").show(false)
For more details, refer to the below links:
Spark SQL Join Types with examples
Hope this helps. Do let us know if you any further queries.
---------------------------------------------------------------------------
Please "Accept the answer" if the information helped you. This will help us and others in the community as well.