Querying a Synapse Dedicated Pool External Table: What is going on behind the scenes?

Question

Querying a Synapse Dedicated Pool External Table: What is going on behind the scenes?

Derek Horrall 201

Out of curiosity, I looked at the request steps while a query was running on a Dedicated Pool External Table (created from a parquet file). It looks like it is actually loading data from datalake storage parquet file into temporary table in synapse dedicated pool and is using RoundRobin. Not an issue, it just surprised me that it worked this way. Am I interpreting this correctly?

User's image

Answer accepted by question author

0 additional answers

Your answer

Answer 1

Hello,
Yes, you're interpreting it correctly. When you query an external table in Azure Synapse Analytics' dedicated SQL pool, it reads the data from the external data source, such as Azure Data Lake Storage, and loads it into temporary tables in the dedicated pool. It uses a Round Robin distribution to distribute the data evenly across the temporary tables.

This approach allows Synapse to leverage its distributed processing capabilities to perform queries on the data, which is crucial for achieving high performance on large datasets.

In the case of querying external tables backed by Parquet files in Azure Data Lake Storage, Synapse Analytics reads the data from the Parquet files, and it loads the required columns into the temporary tables in the dedicated pool. The query then operates on these temporary tables as if the data were stored within the dedicated pool itself.

It's important to note that the data is not persisted in the dedicated pool after the query completes. This approach provides a balance between performance and flexibility, as you can efficiently query data stored in external sources without the need to load and store the data within the dedicated pool permanently.

Share via

Querying a Synapse Dedicated Pool External Table: What is going on behind the scenes?

0 additional answers

Your answer