Azure synapse serverless openrowset parquet

sakuraime 2,341 Reputation points
2021-05-04T15:02:21.597+00:00

Suppose I have a parquet folder

table/year/month/*.parquet

and my query is

SELECT count()
FROM OPENROWSET(BULK '/path/table/2021/10/
.parquet',DATA_SOURCE='source',FORMAT = 'PARQUET')
AS a

the above is for one year and
what if I would like to do for two years 2021 and 2020 ? what's the syntax

SELECT count()
FROM OPENROWSET(BULK '/path/table/2021,2020/10/
.parquet',DATA_SOURCE='source',FORMAT = 'PARQUET')
AS a

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,365 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Samara Soucy - MSFT 5,141 Reputation points
    2021-05-05T00:29:42.78+00:00

    You can do this by using a combination of wildcard (*) and filepath() function:

    SELECT count()  
    FROM OPENROWSET(BULK '/path/table/202*/10/.parquet',DATA_SOURCE='source',FORMAT = 'PARQUET')  
    AS a  
    WHERE  
        a.filepath(3) IN (2020,2021)  
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.