Azure Synaps serverless scan file that should be filtered

Shlomi Lanton 76 Reputation points
2024-07-01T08:48:06.6533333+00:00

Hey,

We have a large amount of files stored in Storage Account in Parquet format with the following hierarchy.
/m_id=%M_ID%/quarter=%QUARTER%/month=%MONTH%/partition_version=last/part-0.parquet'.
examples for quarter values: 2024Q1, 2024Q2, 2024Q3 etc.

We have a view we use the query the data, the view is defined using OPENROWSET , when running a query a WHERE filter and [quarter] in ('2023Q2', '2023Q3', '2023Q4', '2024Q1') we get an error

Error handling external file: 'IO request completed with an error. ERROR = 0x0000000C'. Underlying data description: file '/machine_id=aaaa/quarter=2024Q2/month=6/partition_version=last/part-0.parquet'.

but note that the quarter of 2024Q2 is not in the query, so we expect this file to not be scanned at all.

  1. How can we dig down to this issue?
  2. can we get a list of all files scanned in a query?
  3. We apply multiple filters in the query (on partitions and values), does the order of elements in the query matter?

Thanks

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,657 questions
{count} votes

Accepted answer
  1. Bhargava-MSFT 28,936 Reputation points Microsoft Employee
    2024-07-05T17:09:35.64+00:00

    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I am reposting your solution in case you'd like to "Accept " the answer.

    Issue:

    Getting the below error when running a query with filter and [quarter] in ('2023Q2', '2023Q3', '2023Q4', '2024Q1')

    Error handling external file: 'IO request completed with an error. ERROR = 0x0000000C'. Underlying data description: file '/machine_id=aaaa/quarter=2024Q2/month=6/partition_version=last/part-0.parquet'.

    Solution:

    The issue was with the way the column type was defined for the partitions while building the view. Moving from VARCHAR(MAX) to VARCHAR(%VALUE%) (according to the length of the column) fixed the issue and allowed queries to run with proper partition pruning.

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    I hope this helps!

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Shlomi Lanton 76 Reputation points
    2024-07-04T05:14:39.9233333+00:00

    For anyone that come across this in the feature, it seem our issue was with the way we defined the column type for the partitions while building our view. Moving from VARCHAR(MAX) to VARCHAR(%VALUE%) (according to the length of the column) fixed the issue and allowed us to run queries with proper partition pruning.