I will try my best to provide some example tables and as detailed an explanation as possible. If anything is not clear, please ask. The design is not in 3NF, don't blame me for it, I didn't design it and it would be a mammoth task to redesign it, and I'm not allowed to anyway.
So this particular issue is one that tends to occur a lot due to the nature of the data which is time-based and there is a lot of it, hundreds of millions of rows that are sliced and diced many different ways, in many different tables.
The base data is captured at particular times throughout the day for each product, let's call these ProductEvents, and there is an EventWindow for each event, that is actually the time in seconds between the EventDtTm and the next - pre-calculated by the process that creates the events. N.B. The times of events and the length of their windows are not consistent throughout the day nor for different products:
CREATE TABLE dbo.Products
(
ProductID INT IDENTITY(1,1) PRIMARY KEY,
ProductName VARCHAR(10)
);
CREATE TABLE dbo.ProductEvents
(
EventID INT IDENTITY(1,1) PRIMARY KEY,
ProductID INT,
EventTypeID INT,
EventDtTm DATETIME2(7),
EventWindow INT
);
Although there is an EventTypeID and therefore an EventType table, I'm only interested in one particular event out of about a dozen or so, and this is the case in almost all of the various instances of this coding issue, and the tables and data do vary somewhat but this is the most difficult of them, the rest are easier, anyway read on and it should become clear, I hope.
Now there is another set of events that are strictly taken, or rather, pre-calculated for every minute, some may be missing depending on the availability of the raw data that we obtain from various external sources, and is designed such that there is meta data for each product per day providing a MetaID, and then there is the detail data for minutes per product per event type, again only interested in one particular event.
N.B. There are multiple sources for all of this data but it doesn't really affect the overall query as the sources have to match.
CREATE TABLE dbo.MinuteEventsMetaData
(
MetaID INT IDENTITY(1,1) PRIMARY KEY,
ProductID INT,
MinuteEventDt DATE
);
CREATE TABLE dbo.MinuteEventsDetail
(
MetaID INT PRIMARY KEY,
ValueTypeID INT,
EventMinute TIME(2),
EventValue FLOAT
);
So the objective is to find the value for a particular product event at percentages of the event window - the percentages can change each time but let's stick with 25, 50, & 75 for the sake of simplicity.
If we have a ProductEvent for which the time is 8:30am, and a time window of 3600 seconds, then we would calculate that the window ends at 9:30am, and the times for the percentages would be 8:45am, 9am, and 9:15am respectively. Nice and simple.
Well now, there are ~400million rows in the ProductEvents table, of which only ~7million are of interest, but combine that with the percentages and that results in 7 * 3 = 21million. Actually it's worse than that as the scenario I'm actually looking at right now is for each 5% interval so the total number of rows is 7 * 19 = 133million.
In this resulting data, let's save the intermediate data in a table (variable or temp) as:
DECLARE @ProductPercentages TABLE
(
ProductID INT,
EventDtTm DATETIME2(7),
EventDate DATE, -- calculated from ProductEvents.EventDtTm
EventTime TIME(2), -- calculated from ProductEvents.EventDtTm
Percentage INT,
PercentageTm TIME(2),
MetaID INT,
PRIMARY KEY
(
ProductID,
EventDtTm,
Percentage
)
);
Now we need to get the value from the MinuteEventsDetail that is the closest to but less than or equal to the calculated time for the percentage, and it must be more than the time for the ProductEvent, i.e. EventDtTm. First we need to get the MetaID based on the ProductID and the MinuteEventDt for the EventDtTm, in order to query the MinuteEventsDetail table:
So I'd tried this:
SELECT PP.ProductID,
PP.EventDate,
PP.EventTime,
PP.Percentage,
PP.PercentageTm,
PP.MetaID,
vME.EventValue AS PercentageValue
FROM ProductPercentages as PP
CROSS APPLY
( SELECT TOP 1
EventMinute,
EventValue
FROM dbo.MinuteEvents AS ME
WHERE ME.MetaID = PP.MetaID
AND ME.ValueType = 3
AND ME.EventMinute BETWEEN PP.EventTime AND PP.PercentageTm
ORDER BY EventMinute DESC) AS vME;
As mentioned the ProductPercentages could hold at least 21million rows, and the MinuteEvents table has over 60billion rows. So the above query performs horrendously.
I've tried to break it down, and use an alternative approach but so far all I've achieved is to get frustrated.
Please help.
Thanks,
Michael MacGregor
Senior DBA