Synchronize Apache Spark for Azure Synapse external table definitions in serverless SQL pool

Article
02/16/2022

Serverless SQL pool can automatically synchronize metadata from Apache Spark. A serverless SQL pool database will be created for each database existing in serverless Apache Spark pools.

For each Spark external table based on Parquet or CSV and located in Azure Storage, an external table is created in a serverless SQL pool database. As such, you can shut down your Spark pools and still query Spark external tables from serverless SQL pool.

When a table is partitioned in Spark, files in storage are organized by folders. Serverless SQL pool will use partition metadata and only target relevant folders and files for your query.

Metadata synchronization is automatically configured for each serverless Apache Spark pool provisioned in the Azure Synapse workspace. You can start querying Spark external tables instantly.

Each Spark Parquet or CSV external table located in Azure Storage is represented with an external table in a dbo schema that corresponds to a serverless SQL pool database.

For Spark external table queries, run a query that targets an external [spark_table]. Before running the following example, make sure you have correct access to the storage account where the files are located.

SELECT * FROM [db].dbo.[spark_table]

Apache Spark data types to SQL data types mapping

For more information on mapping Apache Spark data types to SQL data types, see Azure Synapse Analytics shared metadata tables.

Next steps

Advance to the Storage Access Control article to learn more about storage access control.

Share via

Synchronize Apache Spark for Azure Synapse external table definitions in serverless SQL pool

Apache Spark data types to SQL data types mapping

Next steps

Feedback

Feedback

Additional resources