question

NickRyan-1389 avatar image
0 Votes"
NickRyan-1389 asked NickRyan-1389 commented

Find Missing Sequence numbers for a variety of different keys

I've found numerous options for finding missing sequence numbers where the sequence is straight-forward. In my case it's only part of a business key and I can't get my head around how I would use one of those methods in my case.

I have a table where generation in MW is recorded every half hour of the day for around 60 electricity generation stations. The key of each row is Station Node Code, Trading Date and Period ID. Each day has 46, 48 or 50 periods. 46 when we change to Daylight Savings and skip an hour and 50 when we revert to Standard Time and repeat an hour.

So, for each trading date and Station, I'd like to know if there are any trading periods missing.

Small sample:

204866-screenshot-2022-05-24-115820.png

As you can see for one of these stations, we're not missing any measurement periods on that trading day but we are for the others. I'd like to produce a list that would show just those stations, trading dates and trading periods where there is no row like this:

ARA2201 ARA0 2022-04-03 7
ARA2201 ARA0 2022-04-03 8
ARI1101 ARI0 2022-04-03 7
ARI1101 ARI0 2022-04-03 8
etc.

sql-server-transact-sql
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

JingyangLi avatar image
2 Votes"
JingyangLi answered NickRyan-1389 commented

The recursive solution may not provide your optimal performance.
You can try to use a Number table without temp table in one run. I don't have the setup to test the query but you may update this thread to let us know.

   --===== Create number table on-the-fly
 ;WITH Num1 (n) AS (
 SELECT 1 as n
 UNION ALL SELECT n+1 as n
 FROM Num1 Where n <101),
 Num2 (n) AS (SELECT 1 FROM Num1 AS X, Num1 AS Y),
 Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) FROM Num2)
      
  ,mycte1 as (
  SELECT STATION_NODE_CODE,TRADING_DATE,MAX(TRADING_PERIOD) AS maxTRADING_PERIOD 
  FROM dbo.FCT_SCADA_GENERATION t 
  GROUP BY TRADING_DATE,STATION_NODE_CODE
  )
  ,mycte2 as (
  select STATION_NODE_CODE,TRADING_DATE,n as TRADING_PERIOD from mycte1 m
  Cross apply  nums where nums.n<=m.maxTRADING_PERIOD 
  )
  Select C.STATION_NODE_CODE,C.TRADING_DATE,C.TRADING_PERIOD
   from mycte2 C
  LEFT JOIN dbo.FCT_SCADA_GENERATION T 
  ON C.STATION_NODE_CODE=T.STATION_NODE_CODE 
  AND C.TRADING_DATE=T.TRADING_DATE 
  AND C.TRADING_PERIOD=T.TRADING_PERIOD
  WHERE T.TRADING_PERIOD IS NULL AND ( C.TRADING_DATE > '1997-03-21' OR C.TRADING_PERIOD > 16 )


· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

This has the best performance out of the solutions. It took 5:31 to run on my server and data. This is 3 times faster than my variation on LiHong's original script so is the solution I will use.

0 Votes 0 ·

You can use an auxiliary number table in your instance to help you to write high performance queries besides this query.

0 Votes 0 ·

Yes, thank you for the suggestion.

I created a numbers table as a clustered unique index and it dropped my new test query from 3 seconds to < 1. That's not too important but it may be other times I use this table.

0 Votes 0 ·
JingyangLi avatar image
0 Votes"
JingyangLi answered

You can construct a full list table include all 9 periods for all Station Node Code, Trading Date as a temp table.
Use your current table left join to the temp table and check the missing period Ids for these rows in temp table as your result.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

LiHongMSFT-3908 avatar image
1 Vote"
LiHongMSFT-3908 answered NickRyan-1389 commented

Hi @NickRyan-1389
Try this:

 ;WITH CTE1 AS
 (
  SELECT STATION_NODE_COED,TRADING_DATE,MAX(TRADING_PERIOD) AS TRADING_PERIOD
  FROM Table_Name  
  WHERE TRADING_DATE='2022-04-03'
  GROUP BY TRADING_DATE,STATION_NODE_COED
 ),CTE2 AS
 (
  SELECT STATION_NODE_COED,TRADING_DATE,1 AS TRADING_PERIOD
  FROM CTE1
  UNION ALL
  SELECT C1.STATION_NODE_COED,C1.TRADING_DATE,C2.TRADING_PERIOD+1
  FROM CTE2 C2 JOIN CTE1 C1 ON C1.STATION_NODE_COED=C2.STATION_NODE_COED 
                           AND C1.TRADING_DATE=C2.TRADING_DATE
                           AND C2.TRADING_PERIOD < C1.TRADING_PERIOD
 )
 SELECT C.* 
 FROM CTE2 C LEFT JOIN Table_Name T ON C.STATION_NODE_COED=T.STATION_NODE_COED AND C.TRADING_DATE=T.TRADING_DATE AND C.TRADING_PERIOD=T.TRADING_PERIOD
 WHERE T.TRADING_PERIOD IS NULL
 ORDER BY TRADING_DATE,STATION_NODE_COED,TRADING_PERIOD

Best regards,
LiHong


If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

This works very well for that single trading day. I modified the query by removing that WHERE clause to search back to the beginning of the dataset (in 1997) with just under 30 million rows in total. The query has been running for 17 minutes, as I write this.

Is there a way to replicate the recursive CTE using temp tables? I feel that if I could stage this and perhaps add indices to the temp tables, it may improve performance.

0 Votes 0 ·

Do you mean this ?

 ;WITH CTE1 AS
 (
  SELECT STATION_NODE_COED,TRADING_DATE,MAX(TRADING_PERIOD) AS TRADING_PERIOD
  FROM Table_Name  
  GROUP BY TRADING_DATE,STATION_NODE_COED
 ),CTE2 AS
 (
  SELECT STATION_NODE_COED,TRADING_DATE,1 AS TRADING_PERIOD
  FROM CTE1
  UNION ALL
  SELECT C1.STATION_NODE_COED,C1.TRADING_DATE,C2.TRADING_PERIOD+1
  FROM CTE2 C2 JOIN CTE1 C1 ON C1.STATION_NODE_COED=C2.STATION_NODE_COED 
                           AND C1.TRADING_DATE=C2.TRADING_DATE
                           AND C2.TRADING_PERIOD < C1.TRADING_PERIOD
 )
 SELECT * INTO CTE_TEMP_TABLE
 FROM CTE2 
    
 SELECT C.* FROM 
 CTE_TEMP_TABLE C LEFT JOIN Table_Name  T ON C.STATION_NODE_COED=T.STATION_NODE_COED AND C.TRADING_DATE=T.TRADING_DATE AND C.TRADING_PERIOD=T.TRADING_PERIOD
 WHERE T.TRADING_PERIOD IS NULL
 ORDER BY TRADING_DATE,STATION_NODE_COED,TRADING_PERIOD
    
 --DROP TABLE CTE_TEMP_TABLE


0 Votes 0 ·

That one I left running for 2 hours before killing it. Your script was a great help, thank you. With my mods it gives me what I want in a reasonable amount of time.

0 Votes 0 ·