Hi,
If I run that query then I get all the data being scanned 6.49GB (which is the source CSV data in the data lake).
If I run the following query then I only get 29MB scanned
select count(fct.OrderID)
from [LDW].[vwSalesOrdersLinesBI] as fct
inner join [LDW].[vwDimDate] dd on dd.Date = fct.[FilePathDate]
where dd.Date in ('2016-07-08')
It seems that if there are multiple values being passed to the DimDate table to filter, then it doesn't select only the source folders, it just scans all the source data. If a single value is passed in, then it will successfully only scan the folder it needs to.
Hi,
Here's the data from both Views:
Basically if you use IN and pass a list of dates in then only those folders that match the date values are processed. If you use a JOIN then this is not respected and all folders are scanned, it seems that using a JOIN means Serverless SQL ignores the date ranges and just scans all data
Can you please let me kow what happens when you run the query
select count(fct.OrderID)
from [LDW].[vwSalesOrdersLinesBI] as fct
inner join [LDW].[vwDimDate] dd on dd.Date = fct.[FilePathDate]
where dd.Date in ('2016-07-08' ,2016-07-10')
Thanks
Himanshu