Azure synapse serverless openrowset parquet

sakuraime 2,316 Reputation points
2021-05-04T15:02:21.597+00:00

Suppose I have a parquet folder

table/year/month/*.parquet

and my query is

SELECT count()
FROM OPENROWSET(BULK '/path/table/2021/10/
.parquet',DATA_SOURCE='source',FORMAT = 'PARQUET')
AS a

the above is for one year and
what if I would like to do for two years 2021 and 2020 ? what's the syntax

SELECT count()
FROM OPENROWSET(BULK '/path/table/2021,2020/10/
.parquet',DATA_SOURCE='source',FORMAT = 'PARQUET')
AS a

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,111 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Samara Soucy - MSFT 5,046 Reputation points
    2021-05-05T00:29:42.78+00:00

    You can do this by using a combination of wildcard (*) and filepath() function:

    SELECT count()  
    FROM OPENROWSET(BULK '/path/table/202*/10/.parquet',DATA_SOURCE='source',FORMAT = 'PARQUET')  
    AS a  
    WHERE  
        a.filepath(3) IN (2020,2021)  
    
    0 comments No comments