Synapse Python/Spark Notebook code with SQL to query data from linked service to SQL Server

Question

Synapse Python/Spark Notebook code with SQL to query data from linked service to SQL Server

ylycfj88 10

In Synpase, I have linked service to a SQL Server database. I'm looking for a sample notebook code, either in python or spark, that I can run a complex SQL qery within this code to get data from multiple SQL Server tables, put the rsults in a dataframe and then save it to a GEN2 data set.

This should be the most basic step that many analysts/data scientists need to do when working with databases. But I can't find any examples in Synapse gallery.

Thanks !!

Smaran Thoomu 24,110 Reputation points Microsoft External Staff Moderator

2024-04-26T05:11:36.12+00:00

@ylycfj88 Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

1 answer

Your answer

Smaran Thoomu 24,110 Reputation points Microsoft External Staff Moderator

2024-04-26T05:11:36.12+00:00

@ylycfj88 Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

Smaran Thoomu 24,110 Microsoft External Staff Moderator

Hi ylycfj88,

Thanks for the question and using MS Q&A platform.

To query data from a SQL Server database linked service in Synapse using Python or Spark, you can use the following steps:

First, you need to create a Synapse workspace and link it to your SQL Server database. You can follow the instructions in this doc to create a Synapse workspace and link it to your SQL Server database.
Once you have linked your SQL Server database to Synapse, you can create a new notebook in Synapse Studio. In the notebook, you can use the pyspark or spark libraries to connect to your SQL Server database and execute SQL queries. Here's an example code snippet that demonstrates how to query data from a SQL Server database, put the results in a dataframe, and save it to a Gen2 data set:

# Import the necessary libraries
from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder.appName("Query SQL Server").getOrCreate()

# Define the SQL query
query = "(SELECT * FROM table1 JOIN table2 ON table1.id = table2.id) AS joined_tables"

# Define the connection properties
connectionProperties = {
  "user": "<username>",
  "password": "<password>",
  "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}

# Define the JDBC URL
jdbcUrl = "jdbc:sqlserver://<server>:<port>;database=<database>"

# Load the data into a dataframe
df = spark.read.jdbc(url=jdbcUrl, table=query, properties=connectionProperties)

# Save the dataframe to a Gen2 data set
df.write.format("com.databricks.spark.avro").option("compression", "snappy").mode("overwrite").save("<Gen2 data set path>")

In this code snippet, replace <username>, <password>, <server>, <port>, <database>, and <Gen2 data set path> with your specific values.

You can find this video for more information about using pyspark to query data from SQL Server in the Azure Synapse Analytics

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

ylycfj88 10 Reputation points

2024-04-26T13:03:43.0433333+00:00

Thanks a lot for getting back and providing the sample code.

One thing I don't understand. We run this inside the Synapse in which we already created linked service to SQL Server. I assume in the Python/Spark code in Notebook we should be able to use the ID or name of this linked service (i.e., SQL Service domain name in linked service, or its identity name or identity object ID) to connect to SQL Server. I think that's the purpose of creating linked service - give you a way to make connection to DB within Synapse. But using JDBC driver to connect to SQL Server is like we go out of Synapse to make connection, right? When we run program in Synapse and we created linked service, we already connected to that database, and may not need any other drivers. We should be able to tell the program that here is the connection (linked service) and run SQL there.

I could be wrong, but is it possible that we use the linked service for SQL Server database to run SQL query in Python/Spark code?

Thanks!!

Smaran Thoomu 24,110 Microsoft External Staff Moderator

Hi ylycfj88

You're absolutely right! In Synapse, we can use the linked service to connect to the SQL Server database and run SQL queries in Python/Spark code. Here's an updated code snippet that uses the linked service to connect to the SQL Server database:

# Import the necessary libraries
from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder.appName("Query SQL Server").getOrCreate()

# Define the SQL query
query = "(SELECT * FROM table1 JOIN table2 ON table1.id = table2.id) AS joined_tables"

# Define the linked service name
linkedServiceName = "<linked service name>"

# Load the data into a dataframe
df = spark.read.format("jdbc") \
  .option("url", f"jdbc:sqlserver://{linkedServiceName}.database.windows.net:1433;database=<database>") \
  .option("dbtable", query) \
  .option("user", "<username>") \
  .option("password", "<password>") \
  .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
  .load()

# Save the dataframe to a Gen2 data set
df.write.format("com.databricks.spark.avro").option("compression", "snappy").mode("overwrite").save("<Gen2 data set path>")

In this code snippet, replace <linked service name>, <database>, <username>, <password>, and <Gen2 data set path> with your specific values.

I apologize for the confusion in my previous response. Thank you for bringing this to my attention! Let me know if you have any further questions.

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

ylycfj88 10 Reputation points

2024-04-27T13:51:44.0966667+00:00

Thanks for this new sample code. It's very helpful. I tried it and got following error. I googled it and it seems I need to use SQL Server Configuration tool to enable TCP/IP, may need to remove the value of TcpDynamicPorts and also set 1433 to TcpPort. I'll check with our system admin on Monday and let you know. Thanks again for your help.

error message:

---> 23 .load()

Py4JJavaError: An error occurred while calling o4057.load.

: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host XXXXXSynapseProd_CommOps.database.windows.net,

port 1433 has failed. Error: "XXXXXSynapseProd_CommOps.database.windows.net: Name or service not known. Verify the connection properties.

Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
Smaran Thoomu 24,110 Reputation points Microsoft External Staff Moderator

2024-04-29T05:28:15.9666667+00:00

ylycfj88 Following up to see if the above suggestion was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
ylycfj88 10 Reputation points

2024-05-01T11:58:29.73+00:00

See following error message - it seems the program reached the last step to connect. I think when you have a linked service to SQL Server you are supposed to be able to connect to it or I still need to ask system admin to change some setting on SQL Server side before I can make this work? Thanks!

Py4JJavaError: An error occurred while calling o4202.load. : com.microsoft.sqlserver.jdbc.SQLServerException: Reason: An instance-specific error occurred while establishing a connection to SQL Server. Connection was denied since Deny Public Network Access is set to Yes (https://docs.microsoft.com/azure/azure-sql/database/connectivity-settings#deny-public-network-access). To connect to this server, use the Private Endpoint from inside your virtual network (https://docs.microsoft.com/azure/sql-database/sql-database-private-endpoint-overview#how-to-set-up-private-link-for-azure-sql-database). ClientConnectionId:46f2a6f0-7dcf-4fff-ad65-eab41ce6af7b
Smaran Thoomu 24,110 Reputation points Microsoft External Staff Moderator

2024-05-01T17:34:22.4066667+00:00

ylycfj88 Based on the error message you provided, it seems that the issue is related to the connectivity settings of your SQL Server database. The error message suggests that the "Deny Public Network Access" option is set to "Yes", which prevents public network access to the SQL Server database. To resolve this issue, you may need to set up a private endpoint for your SQL Server database. If you're unsure how to proceed, it may be best to reach out to your system administrator for assistance.
Eswar Chitirala 0 Reputation points

2024-05-22T10:36:44.55+00:00

What is the different between first code and the code provided in the 2nd step. Both are expecting connection string. I am also looking for a solution that can connect without using the username and password and access using the linked service
ylycfj88 10 Reputation points

2024-05-22T19:27:53.9733333+00:00

Eswar - totally makes sense. Actually that's the question from our system admin when I showed him the codes and told him I got errors. He said if we run the code inside Synapse we are already connected to SQL Server by Linked Service and should be able to run SQL on SQL Server without username and password.

Smaran Thoomu: do you have any code that run the query on SQL Server data without using username and password? Thanks!
Smaran Thoomu 24,110 Reputation points Microsoft External Staff Moderator

2024-05-23T08:10:56.3933333+00:00

Hi ylycfj88

Thanks for the follow-up question. I found a similar thread that may help you with your query.

It provides additional information on how to access SQL Server database in Synapse Notebook without specifying the username and password.

Hope this helps. Do let us know if you have any further queries.
Eswar Chitirala 0 Reputation points

2024-05-23T08:45:47.8466667+00:00

Hi, the other thread shared also uses a MSI Auth. Since we already have the linked service with proper authentication. I am trying to find a way to use the linked service only and no other authentication. Since I am able to do that for ADLS and the examples are also available, but none exists for SQL database.

Share via

Synapse Python/Spark Notebook code with SQL to query data from linked service to SQL Server

1 answer

error message:

Your answer