Synapse can't find data source com.microsoft.sqlserver.jdbc.spark on apache spark pool 3.3

heta desai 357 Reputation points
2023-07-05T14:25:38.6633333+00:00

I am trying to pull data from Azure SQL Database using spark notebook on Azure synapse spark pool. Apache Spark version 3.3

df = spark.read.format("com.microsoft.sqlserver.jdbc.spark") \
                            .option("url", Url) \
                            .option("query", query) \
                            .option("user", User) \
                            .option("password", Password) \
                            .load()

Here is the error:

Py4JJavaError: An error occurred while calling o3897.load. : java.lang.ClassNotFoundException: Failed to find data source: com.microsoft.sqlserver.jdbc.spark. Please find packages at https://spark.apache.org/third-party-projects.html

Above command works perfectly fine in Apache Spark 3.1 and 3.2.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,256 questions
0 comments No comments
{count} vote

Accepted answer
  1. Bhargava-MSFT 31,226 Reputation points Microsoft Employee
    2023-07-05T22:11:44.3+00:00

    @heta desai

    Welcome to the MS Q&A platform.

    This issue has been discussed here

    Root cause:
    This connector jar is still in beta, and there is no official stable version yet. So due to this reason, you are seeing the error "spark-mssql-connector jar missing from spark 3.3"

    Resolution:

    The Resolution is to add this jar https://repo1.maven.org/maven2/com/microsoft/azure/spark-mssql-connector_2.12/1.3.0-BETA/spark-mssql-connector_2.12-1.3.0-BETA.jar   to their spark pools using package management feature

    https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-pool-packages

    https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries

    This issue should be resolved once you add the jar.

    Please see this video tutorial for how to add the jar file to the Spark pool.

    I hope this helps. Please let me know if you have any further questions.

    If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions. 


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.