Querying a Synapse Dedicated Pool External Table: What is going on behind the scenes?

Derek Horrall 201 Reputation points
2023-04-15T12:37:10.6133333+00:00

Out of curiosity, I looked at the request steps while a query was running on a Dedicated Pool External Table (created from a parquet file). It looks like it is actually loading data from datalake storage parquet file into temporary table in synapse dedicated pool and is using RoundRobin. Not an issue, it just surprised me that it worked this way. Am I interpreting this correctly?

User's image

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
0 comments No comments
{count} votes

Answer accepted by question author
  1. Boris Von Dahle 3,221 Reputation points
    2023-04-15T15:37:53.9633333+00:00

    Hello,
    Yes, you're interpreting it correctly. When you query an external table in Azure Synapse Analytics' dedicated SQL pool, it reads the data from the external data source, such as Azure Data Lake Storage, and loads it into temporary tables in the dedicated pool. It uses a Round Robin distribution to distribute the data evenly across the temporary tables.

    This approach allows Synapse to leverage its distributed processing capabilities to perform queries on the data, which is crucial for achieving high performance on large datasets.

    In the case of querying external tables backed by Parquet files in Azure Data Lake Storage, Synapse Analytics reads the data from the Parquet files, and it loads the required columns into the temporary tables in the dedicated pool. The query then operates on these temporary tables as if the data were stored within the dedicated pool itself.

    It's important to note that the data is not persisted in the dedicated pool after the query completes. This approach provides a balance between performance and flexibility, as you can efficiently query data stored in external sources without the need to load and store the data within the dedicated pool permanently.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.