I did get this working, including a list of packages I added to the server.
Note 1: Everything in a serverless database is in storage somewhere- it doesn't have any storage on its own. If you can go straight to storage that scenario is better supported. I can see scenarios where this is not ideal though- wanting to access views on the serverless DB for example.
Note 2: There is a synapsesql()
function for connecting between Spark and SQL pools. This only works on dedicated pools and is designed to data transfer only, so there are some limitations there.
Note 3: While SQL logins are technically supported by serverless pools, the fact that the login also needs to have permissions on the storage account makes it a less than ideal route- using AAD is going to have a much better result.
With that out of the way, here are the packages I added to my Spark pool. There are other dependencies, but they are already included in the base image.
- mssql-jdbc-9.2.1.jre8.jar
- spark-mssql-connector-1.0.1.jar
- msal4j-1.10.0.jar
- asm-8.0.1.jar
- content-type-2.1.jar
- lang-tag-1.5.jar
- oauth2-oidc-sdk-9.4.jar
Once the pool has updated you can use this as a base template for accessing the pool. 'ActiveDirectoryPassword' or 'ActiveDirectoryServicePrincipal'. MSI and Interactive logins don't work at this time. You can also run adhoc queries using the query
property.
%%pyspark
db_properties={}
db_properties["authentication"] = "ActiveDirectoryServicePrincipal"
db_properties["AADSecurePrincipalId"] = "<app ID>"
db_properties["AADSecurePrincipalSecret"] = "<app secret>"
db_properties["driver"] = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
serverName = "<server name>"
databaseName = "<database name>"
tableName = "<table name>"
data = spark.read.jdbc("jdbc:sqlserver://<server>-ondemand.sql.azuresynapse.net;databaseName=<dbName>", "<table>",properties=db_properties)
data.show(10)