Is there a way to query or return folder names in Synapse Analytics?
I am working with partitioned azure cost data in a data lake. Data is partitioned by month and day. One problem is that the daily data can extend beyond the current month. The second issue is that the day folder contains all the data for the entire month, so when I query the data, I only need to get data from the last day folder each month.
I have been able to work out a query to get the last folder, but it requires me to scan all the files 1st, which means I am essentially doing a full folder scan (currently about 6GB of data) to identify the 400 MB of data I need to read.
I would like to know if anyone knows of a way that I can return just the names of folders in a Serverless SQL Query without reading all the files in the folders.
Here is a copy of the code I am currently working with:
with usethesefiles as (
select
distinct
files.filepath(1) as file_path_one
,files.filepath(2) as file_path_two
from
openrowset(bulk 'azurecosts/azuredailyexport/*/*/*/*.csv'
, data_source = 'mydatalake_dfs_core_windows_net'
, format = 'CSV'
, parser_version = '2.0'
, string_delimiter = '"'
, header_row = true
)as files)
select file_path_one,
max(file_path_two) as file_path_two
from usethesefiles
group by file_path_one