Synapse can't find data source com.microsoft.sqlserver.jdbc.spark

Question

Synapse can't find data source com.microsoft.sqlserver.jdbc.spark

Mike Wong 46

Hello,

I am connecting to an Azure SQL database using Synapse notebooks and the following block of code:


df_config = spark.read.format("com.microsoft.sqlserver.jdbc.spark") \

    .option("url",url) \

    .option("dbtable", f"{SchemaName}.{TableName}") \

    .option("databasename", Database) \

    .option("accessToken", access_token) \

    .option("encrypt", "true") \

    .option("hostNameInCertificate", "*.database.windows.net") \

    .load()

This worked fine for the entire week and now I'm suddenly getting this error overnight:

Py4JJavaError: An error occurred while calling o3930.load. : java.lang.ClassNotFoundException: Failed to find data source: com.microsoft.sqlserver.jdbc.spark. Please find packages at https://spark.apache.org/third-party-projects.html

I don't really understand why I'm getting this error. I have allowed session packages on the Apache Spark pool, I am using the latest versions of Apache Spark and Python available to the cluster and, most importantly, have been using this code all week and nothing has changed. The error above suggests that that source does not exist although it's available out of the box in Synapse notebooks.

Does anybody have any answers? Thank you!

Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-05-19T21:43:11.63+00:00

Hello Mike Wong,

Welcome to the MS Q&A platform.

I was able to execute the command from my end. As per the error message, it could be due to the Spark driver or executor.

Can you check the version of Apache Spark that you are using by running the following command:

print(sc.version)
Mike Wong 46 Reputation points

2023-05-19T22:33:25.2633333+00:00

Hello, thank you for replying. I managed to create a work around by lowering my Apache Spark version of my cluster to 3.1.

What's confusing is I've been using 3.3 without any problems and I can see you're using 3.2. Would you happen to know why I could connect using Spark 3.3 yesterday and today I can't? Thank you!
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-05-19T22:39:57.0333333+00:00

Mike Wong,

Glad to know that you found a workaround. Not sure if any changes at the synapse end(which could be a product issue) are causing the issue. But I can check with my internal team and get back to you with more details.

Bandhit Suksiri 21

@Bhargava-MSFT

Surprisingly I faced the same problem with Mike Wong on Spark 3.3.

It was running without the issue until 5/18/2023, 4:00:00 PM UTC.

Then, from 5/19/2023, 4:00:01 PM UTC until now, we have the exact same logs.

Py4JJavaError: An error occurred while calling o3874.load.
: java.lang.ClassNotFoundException:
Failed to find data source: com.microsoft.sqlserver.jdbc.spark. Please find packages at
https://spark.apache.org/third-party-projects.html

Full logs:

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:184, in DataFrameReader.load(self, path, format, schema, **options)
    182     return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
    183 else:
--> 184     return self._df(self._jreader.load())

File ~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args)
   1315 command = proto.CALL_COMMAND_NAME +\
   1316     self.command_header +\
   1317     args_command +\
   1318     proto.END_COMMAND_PART
   1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
   1322     answer, self.gateway_client, self.target_id, self.name)
   1324 for temp_arg in temp_args:
   1325     temp_arg._detach()

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py:190, in capture_sql_exception.<locals>.deco(*a, **kw)
    188 def deco(*a: Any, **kw: Any) -> Any:
    189     try:
--> 190         return f(*a, **kw)
    191     except Py4JJavaError as e:
    192         converted = convert_exception(e.java_exception)

File ~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o3874.load.
: java.lang.ClassNotFoundException: 
Failed to find data source: com.microsoft.sqlserver.jdbc.spark. Please find packages at
https://spark.apache.org/third-party-projects.html
       
	at org.apache.spark.sql.errors.QueryExecutionErrors$.failedToFindDataSourceError(QueryExecutionErrors.scala:587)
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:675)
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:725)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:215)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:173)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbc.spark.DefaultSource
	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:661)
	at scala.util.Try$.apply(Try.scala:213)
	at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:661)
	at scala.util.Failure.orElse(Try.scala:224)
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:661)
	... 14 more

Accepted answer

0 additional answers

Your answer

Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-05-19T21:43:11.63+00:00

Hello Mike Wong,

Welcome to the MS Q&A platform.

I was able to execute the command from my end. As per the error message, it could be due to the Spark driver or executor.

Can you check the version of Apache Spark that you are using by running the following command:

print(sc.version)
Mike Wong 46 Reputation points

2023-05-19T22:33:25.2633333+00:00

Hello, thank you for replying. I managed to create a work around by lowering my Apache Spark version of my cluster to 3.1.

What's confusing is I've been using 3.3 without any problems and I can see you're using 3.2. Would you happen to know why I could connect using Spark 3.3 yesterday and today I can't? Thank you!
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-05-19T22:39:57.0333333+00:00

Mike Wong,

Glad to know that you found a workaround. Not sure if any changes at the synapse end(which could be a product issue) are causing the issue. But I can check with my internal team and get back to you with more details.

Answer 1

Bhargava-MSFT 31,261 Microsoft Employee Moderator

Hello Mike Wong and Bandhit Suksiri,

This connector jar is still in beta, and there is no official stable version yet. So due to this reason, you are seeing the error "spark-mssql-connector jar missing from spark 3.3"

The Resolution is to add this jar https://repo1.maven.org/maven2/com/microsoft/azure/spark-mssql-connector_2.12/1.3.0-BETA/spark-mssql-connector_2.12-1.3.0-BETA.jar to their spark pools using package management feature

https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-pool-packages

https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries

This issue should be resolved once you add the jar.

Please see this video tutorial for how to add the jar file to the Spark pool.

I hope this helps. Please let me know if you have any further questions.

If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.

Mike Wong 46 Reputation points

2023-05-23T08:00:59.4+00:00

Hello BhargavaGunnam,

Thank you for your reply. Can I ask where you found the information that the connector is still in beta? Asking because it wasn't very clear to me and it wasn't communicated anywhere during spark pool creation and this massively disrupted a big release before we went to production. I'd like to avoid that in the future if possible.

Thank you!
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-05-23T14:26:28.5833333+00:00

Hello Mike Wong

Thank you for accepting the answer.

The provided information in the answer is from the Product Team. I hope this helps.
Yukti Saxena 0 Reputation points

2023-08-16T08:43:31.35+00:00

is there a way to identify that the jar file shared is free from any virus or injections? if we could identify license of the file shared or some other means to identify that this external jar file is safe for us to add to our code.
Joel Cochran 5 Reputation points

2024-01-04T15:58:08.25+00:00

This solution worked for our Spark 3.3 pool, but it concerns me greatly that this library is still in BETA and has not been updated since Feb 2023. Synapse Spark pools need to support connecting to Azure SQL databases out of the box.
Jesse Horn-Artera 0 Reputation points

2024-06-10T16:27:08.0566667+00:00

AMEN.
Ali Gohar 20 Reputation points

2024-08-06T11:00:13.5966667+00:00

For Spark 3.3 support this library is still in BETA since Feb 2023. I have to use it for my production jobs, but I am slightly concerned about the stability due to its status. Any update on when this will go for GA?
Bandhit Suksiri 21 Reputation points

2024-08-06T11:10:37.6233333+00:00

It seem like there's always been BETA since Feb 18, 2023 unfortunately. There is nothing we can do...
Bandhit Suksiri 21 Reputation points

2024-08-06T11:10:58.3933333+00:00

For your reference: https://github.com/microsoft/sql-spark-connector/releases

Share via

Synapse can't find data source com.microsoft.sqlserver.jdbc.spark

0 additional answers

Your answer