Azure Synaps serverless scan file that should be filtered

Shlomi Lanton 56 Reputation points
2024-07-01T08:48:06.6533333+00:00

Hey,

We have a large amount of files stored in Storage Account in Parquet format with the following hierarchy.
/m_id=%M_ID%/quarter=%QUARTER%/month=%MONTH%/partition_version=last/part-0.parquet'.
examples for quarter values: 2024Q1, 2024Q2, 2024Q3 etc.

We have a view we use the query the data, the view is defined using OPENROWSET , when running a query a WHERE filter and [quarter] in ('2023Q2', '2023Q3', '2023Q4', '2024Q1') we get an error

Error handling external file: 'IO request completed with an error. ERROR = 0x0000000C'. Underlying data description: file '/machine_id=aaaa/quarter=2024Q2/month=6/partition_version=last/part-0.parquet'.

but note that the quarter of 2024Q2 is not in the query, so we expect this file to not be scanned at all.

  1. How can we dig down to this issue?
  2. can we get a list of all files scanned in a query?
  3. We apply multiple filters in the query (on partitions and values), does the order of elements in the query matter?

Thanks

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,621 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Shlomi Lanton 56 Reputation points
    2024-07-04T05:14:39.9233333+00:00

    For anyone that come across this in the feature, it seem our issue was with the way we defined the column type for the partitions while building our view. Moving from VARCHAR(MAX) to VARCHAR(%VALUE%) (according to the length of the column) fixed the issue and allowed us to run queries with proper partition pruning.

    0 comments No comments