Share via

why adding filepath slows down the query ?

Mahesh Raja Mahalingam 81 Reputation points
2021-11-03T02:44:41.467+00:00

Hi ,

  I am trying to create view for the partitioned parquet files in my filesystem (TB of data). when querying an single parquet file without adding any filepath wildcard performs faster and returns result within few seconds   

146021-withoutfilepath.png

But when I introduce filepath wildcard , it slows down a lot (running more than 10 min). Both supposed to query the same file and i am expecting the same performance. Please advice if I am doing anything wrong

146013-image.png

Azure Synapse Analytics
Azure Synapse Analytics

An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.

Developer technologies | Transact-SQL
Developer technologies | Transact-SQL

A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.


Answer accepted by question author

Erland Sommarskog 134.7K Reputation points MVP Volunteer Moderator
2021-11-09T22:24:21.553+00:00

I guess you need to use dynamic SQL:

SELECT @sql = concat('SELECT TOP * 100
         FROM  OPENROWSET (BULK, 'http://.../Day=', @day, '/TripID=', @trip', 
          '/*.parquet', FORMAT = PARQUET) AS result'
EXEC sp_executesql

From what I can tell this is supported on Synapse as well.

Was this answer helpful?

0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Erland Sommarskog 134.7K Reputation points MVP Volunteer Moderator
    2021-11-03T22:31:00.733+00:00

    Let me start off to say that I know very little about Synapse.

    But I can't say that I am surprised by the performance difference. For the performance to be the same, the optimizer would know how to push the arguments in the WHERE clause into the OPENROWSET function. Else OPENROWSET needs to retrieve all data, and only then Azure Synapse is able to filter it. On the other hand, with an exact path, OPENROWSET can go straight to the matter.

    But as I said, I do not work with Synapse, so I could be completely off target.

    Was this answer helpful?


  2. ShaikMaheer-MSFT 38,631 Reputation points Microsoft Employee Moderator
    2021-11-03T16:40:12.653+00:00

    Hi @Mahesh Raja Mahalingam ,

    Thank you for posting your query in Microsoft Q&A Platform.

    Could you please try using IN keyword for filepath() function and see how it behaves. Click here to know about filepath() function

    146179-image.png

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.