How to count consecutive values across rows?

Wheat, Samuel G (Contractor) 21 Reputation points
2022-01-27T18:12:55.143+00:00

1 . ) How to count consecutive values across rows? Start from zero and count cumulatively each time the value repeats. Restart from zero each time the value changes.
Example:

	SomeValue	ConsecutiveCount  
	1			0  
	1			1  
	1			2  
	0			0  
	1			0  
	1			1  
	  

Attached script creates a db with two tables that shows an attempt to accomplish the above.

2 . ) Query in attached script is obviously incorrect with respect to accomplishing the task stated above. That said, the results returned by the query appear to be inconsistent and unmatchable to any pattern demonstated in the data. For example the following row returned by the query against TableB shows a value of 2 in the SameConsecutiveResponseCount column:

	2018-03-14 00:00:00.000	67115991	1	0	2		  

For the query as written in the attached script, how is the value 2 calculated? What does the value 2 represent?
169192-capture.png

3 . ) The same query as the attached run against the same data on our production server yields different results. I had to truncate some data to keep the example small, however as shown in the screen cap below the SameConsecutiveResponseCount starts at 3 and increments by 3 for each row. The first three rows on the prod server are exactly the same as what is provided in the example. At a minimum I would expect those rows to return identical results. Why does this occur?

169039-capture2.png

169098-consecutive-values-problem.txt

select @@version  

Microsoft SQL Server 2019 (RTM-GDR) (KB4583458) - 15.0.2080.9 (X64) Nov 6 2020 16:50:01 Copyright (C) 2019 Microsoft Corporation Developer Edition (64-bit) on Windows 10 Pro 10.0 <X64> (Build 18363: )

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,546 questions
0 comments No comments
{count} votes

Accepted answer
  1. LiHong-MSFT 10,046 Reputation points
    2022-01-28T01:36:18.76+00:00

    Hi @Wheat, Samuel G (Contractor)
    For questions 2.) Because you SUM with partition by Response,which means: " first SUM partition :Response=0" , "second SUM partition :Response=1"

    sum(case when Response = LastResponse then 1 else 0 end) over(partition by Response order by InterrogationDate)

    Change order by clause to this:order by SerialNumber, Response,InterrogationDate and you will find the reason.
    169261-image.png


5 additional answers

Sort by: Most helpful
  1. Erland Sommarskog 100.8K Reputation points MVP
    2022-02-04T21:24:23.703+00:00

    I said previously that it would be possible to express this in T-SQL that this can be done with a single iteration over the data. I was wrong. My friend Itzik Ben-Gan, who is a lot smarter than me when it comes to T-SQL, was kind to share a solution:

    with c as
    (
      select *,
        row_number() over(partition by SerialNumber order by InterrogationDate) as rn,
        lag(Response) over(partition by SerialNumber order by InterrogationDate) as
    PrevResponse
      from dbo.TableB
    )
    select *,
      rn - max(case when Response = PrevResponse then null else rn - 1 end)
             over(partition by SerialNumber order by InterrogationDate rows unbounded
    preceding) as SeqNo
    from c;
    

    As far as I can tell, it gives the desired result.

    I can speak for the performance, because I don't have your table, but it is essential that you have this index:

    CREATE INDEX SerialNumber_InterrogationDate (SerialNumber, InterrogationDate) INCLUDE (Response)
    

    If the clustered index on the table is on (SerialNumber, InterrogationDate) that should work, but I note that in the script you posted, you have the key columns in TableB in reverse order. That index is not good for this query, because the data has to be re-sorted.

    1 person found this answer helpful.
    0 comments No comments

  2. Viorel 111.7K Reputation points
    2022-01-27T19:17:47.603+00:00

    Check if this query works for your sample TableB:

    ;
    with Q as
    (
        select *, 
            row_number() over (partition by SerialNumber order by InterrogationDate) -
                row_number() over (partition by SerialNumber, Response order by InterrogationDate) d
        from TableB 
    )
    select InterrogationDate, SerialNumber, Response,
        row_number() over (partition by SerialNumber, Response, d order by InterrogationDate) as SameConsecutiveResponseCount
    from Q
    order by SerialNumber, InterrogationDate
    
    0 comments No comments

  3. Wheat, Samuel G (Contractor) 21 Reputation points
    2022-01-27T20:55:12.983+00:00

    Hello, @Viorel , thank you for your answer. Yes, the query you provided does work on the sample data. Thanks very much.
    Unfortunately, however the query runs slowly on our production data. The table is very large - 1.2 billion rows - so we need it to run as quickly as possible. I have let it run for over 45 mins and it does not return 1000 rows. Is there any chance a query like the one I provided can be made to work? InterrogationDate and SerialNumber are the composite PK and are indexed.

    I am still very interested to know the answers to questions 2.) and 3.) that I posted.

    Regards,
    Sam

    169187-capture.png


  4. Erland Sommarskog 100.8K Reputation points MVP
    2022-01-27T22:50:24.467+00:00

    To start somewhere: You should download and install CU14 which is the most recent Cumulative Update of SQL 2019 to get access to many bug fixes and improvements.

    As for your question 2): The 2 is the same as the 2 just above. You are saying:

    sum(case when Response = LastResponse then 1 else 0 end) over(partition by Response order by InterrogationDate)
    

    At this point Response is 1 and LastResponse is 0, so the CASE expression returns 0 and adds nothing to the sum. So the 2, which is in the same partition group is left unchanged.

    As for what happens in 3) I'm not sure that I want to go into. Your LAG has one PARTITION clause and the SUM has another, and, well, it obviously does not end well.

    As for your actual problem, this one of these problems that are difficult to implement efficiently with a pure set-based solution. It's obvious that it can be done as a single pass over the data, but there is no way to express that efficiently. So, yes, it will have to be a loop. However, it does not have to be a loop one row at a time, but you can loop abreast. That is, I assume that you want this numbering for an individual SerialNumber, so you can handle multiple SerialNumber in parallel. With that brutal amount of data, you should probably take a couple at a time.

    To give you some inspiration you can watch my lightning talk Looping Abreast.