Sharing Tables Between Spark Pools and Serverless SQL Pools in Synapse Analytics

Gabriel25 525 Reputation points
2024-08-23T11:31:00.59+00:00

While Azure Synapse Analytics documentation(https://learn.microsoft.com/en-us/azure/synapse-analytics/metadata/overview) states that Apache Spark pools can access databases and tables created in serverless SQL pools, it's unclear if the reverse is true. If I create a table or external table in a serverless SQL pool, will it be automatically available to a Spark pool?

Additionally, the documentation(https://learn.microsoft.com/en-us/azure/synapse-analytics/overview-what-is) mentions that both SQL and Spark can directly access Parquet, CSV, TSV, and JSON files in the data lake. If a table is created using a JSON file in a serverless SQL pool, will it be accessible to a Spark pool?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} votes

Accepted answer
  1. Smaran Thoomu 24,110 Reputation points Microsoft External Staff Moderator
    2024-08-23T12:44:45.2933333+00:00

    Hi @vikranth-0706
    Thanks for the question and using MS Q&A platform.

    If I create a table or external table in a serverless SQL pool, will it be automatically available to a Spark pool?

    It's not possible to directly access tables from the Synapse serverless SQL pool, as they are not available externally. However, Synapse provides a feature that allows you to access Spark database objects without needing the Spark pool to be active, by synchronizing them with Serverless pools.
    For reference, Spark Table to Serverless Pool Metadata Sync

    If a table is created using a JSON file in a serverless SQL pool, will it be accessible to a Spark pool?

    No, The Synapse serverless SQL pool tables cannot be accessed directly from outside. To access these tables, the JDBC connector in Synapse should be used. Additionally, objects created in the SQL engine won't be visible in the Spark Pool. To retrieve data from SQL for Spark, the SQL JDBC connection must be utilized as below.

    print("read data from SQL")
    jdbcDF = spark.read \
            .format("com.microsoft.sqlserver.jdbc.spark") \
            .option("url", url) \
            .option("dbtable", dbtable) \
            .option("user", user) \
            .option("password", password).load()
    
    jdbcDF.show(5)
    

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.