What is the best way to connect to Serverless SQL from a Synapse pySpark Notebook?

Tim 31 Reputation points
2022-07-28T20:27:08.677+00:00

I am currently trying to use the documentation for AD Based authentication at https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/synapse-spark-sql-pool-import-export?tabs=python%2Cpython1%2Cscala2%2Cscala3%2Cscala4%2Cscala5#read-using-azure-ad-based-authentication. It is almost working in that I can get to the Serverless SQL View but it is having issues showing the data. "An error occurred while calling o915.showString. : java.sql.SQLException: com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near 'HADOOP'." But I'm wondering if this is the best or only way to connect to the built-in Synapse Serverless pool.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,917 questions
{count} vote

2 answers

Sort by: Most helpful
  1. Tim 31 Reputation points
    2022-07-31T18:48:40.257+00:00

    Hello, I'm trying to read a serverless view from a pySpark notebook. I originally thought that I could using the synapsesql method from the com.microsoft.spark.sqlanalytics driver. I was encouraged because I authenticated and it read the database metadata. But it failed to convert the actual data. Now I assume that particular driver only works with Dedicated Pools.

    So now I've switched back to the com.microsoft.sqlserver.jdbc.spark driver to use for my serverless connection. I'm trying AD authentication with this driver but have been unsuccessful thus far. I receive error message ": com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user '<token-identified principal>'. Incorrect or invalid token." My code to retrieve the token and use it is:

    server_name = "jdbc:sqlserver://bi-synapse-dev-workspace-ondemand.sql.azuresynapse.net"
    database_name = "Bronze"
    url = server_name + ":1433;" + "databaseName=" + database_name + ";"
    print(url)

    table_name = "fo.CustTable"

    accessToken = mssparkutils.credentials.getToken('Synapse')

    df = (spark.read \
    .format("com.microsoft.sqlserver.jdbc.spark") \
    .option("url", url) \
    .option("dbtable", table_name) \
    .option("accessToken", accessToken) \
    .load())


  2. Tim 31 Reputation points
    2022-08-03T00:18:35.89+00:00

    OK, thank you for your time. The AD-base db user with privileges was tried without success. Actually I will open a ticket with Support to first of all connect to an actual Azure SQL DB (i.e. not Serverless SQL). It turns I cannot even AD authenticate via pySpark to Azure SQL DB using the same code pattern. However I can authenticate with the less desirable username and password approach. I'm thinking that perhaps my attempt obtain the Synapse managed identity access token via "accessToken = mssparkutils.credentials.getToken('Synapse')" is not quite right. If I get an answer from Support then I'll try to come back here to update the thread.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.