Serverless SQL Pools filepath() function and range filters

Serverless SQL 216 Reputation points MVP
2021-06-28T10:34:51.03+00:00

Hi,

I have a question about why Serverless SQL Pools scans all data in the source folders when I use the filepath() function as a column in a View and join a Date table to it.

EG this query will scan all data in the source folder. The FilePathDate is a column in the [LDW].[vwSalesOrdersLinesBI] view that exposes the filepath() value

select count(fct.OrderID)
from [LDW].[vwSalesOrdersLinesBI] as fct
inner join [LDW].[vwDimDate] dd on dd.Date = fct.[FilePathDate]
where dd.Date between '2016-07-08' and '2016-07-10'

However, if I change the query to use IN and remove the join, then only the specific source folders are queried and data processing is reduced:

select count(fct.OrderID)
from [LDW].[vwSalesOrdersLinesBI] as fct
--where fct.FilePathDate IN (SELECT '2016-07-08' UNION SELECT '2016-07-09' UNION SELECT '2016-07-10')
where fct.FilePathDate IN (CAST( '20160712 00:00:00' AS datetime),CAST( '20160726 00:00:00' AS datetime),CAST( '20160713 00:00:00' AS datetime))

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,429 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Serverless SQL 216 Reputation points MVP
    2021-07-02T09:26:32.517+00:00

    Hi,

    If I run that query then I get all the data being scanned 6.49GB (which is the source CSV data in the data lake).

    111344-image.png

    If I run the following query then I only get 29MB scanned

    select count(fct.OrderID)
    from [LDW].[vwSalesOrdersLinesBI] as fct
    inner join [LDW].[vwDimDate] dd on dd.Date = fct.[FilePathDate]
    where dd.Date in ('2016-07-08')

    It seems that if there are multiple values being passed to the DimDate table to filter, then it doesn't select only the source folders, it just scans all the source data. If a single value is passed in, then it will successfully only scan the folder it needs to.

    0 comments No comments