Synapse Spark notebook reading dedicated pool using synapsesql() - how to query with SQL?

Question

I'm trying to query dedicated pool tables using the Synapse Spark notebooks. I'm referring to the document here:

https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/synapse-spark-sql-pool-import-export#usage

It seems to me that I can only pass the table name into the method, for example, if I need to join 2 tables, I need to read both tables into 2 data frame and join them using Spark.

val df_1 = spark.read.synapsesql("..")
val df_2 = spark.read.synapsesql("..")

then do the join in Spark.

I'm wondering if it's possible to pass the SQL join statement into the spark.read.synapsesql? Like this

val df = spark.read.synapsesql("SQL that join TableName_1 and TableName_2")

I have tried to put a simple SQL selection like 'select * from ..' and it complains for 'SyntaxError: invalid syntax'.

Any advice is greatly appreciated!

Accepted Answer

Hello @alex feng ,

Thanks for the question and using MS Q&A platform.

You cannot pass SQL join statement into the spark.read.synapsesql.

You can try the below to join two data frames as shown below.

For the demo purpose, I had created two tables in the Azure Synapse Dedicated pool named chepra and the tables employee which contains (EMP_ID and EMP_NAME) and employeedetails which contains (EMP_ID, EMP_EMAIL and EMP_CONTACT) as shown below.

You can use any one of the below methods to join two data frames:

Method1:

%%spark  
val empdf = spark.read.sqlanalytics("chepra.dbo.employee")   
val empdetailsdf = spark.read.sqlanalytics("chepra.dbo.employeedetails")   
val join = empdf.join(empdetailsdf, Seq("EMP_ID"), "inner")  
join.show

Method2:

%%spark  
val empdf = spark.read.sqlanalytics("chepra.dbo.employee")   
val empdetailsdf = spark.read.sqlanalytics("chepra.dbo.employeedetails")   
empdf.join(empdetailsdf,empdf("EMP_ID") === empdetailsdf("EMP_ID"),"inner").show(false)

For more details, refer to the below links:

Spark SQL Join Types with examples

Joins in Apache Spark

Hope this helps. Do let us know if you any further queries.

---------------------------------------------------------------------------

Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

Share via

Synapse Spark notebook reading dedicated pool using synapsesql() - how to query with SQL?

0 additional answers