Group data into buckets of up to a certain row count limit

IgorM 61 Reputation points
2022-08-06T20:25:03.537+00:00

Hi,

I have a transact table with many rows of data. There is a TransactDate column in the table. I need to run a query which would group the transactions into groups of, say, up to 1000 rows in an ascending order of TransactDate. I also need to retrieve the earliest and the latest transaction date for each group created.

Kind regards,

Developer technologies | Transact-SQL
{count} votes

Accepted answer
  1. Viorel 122.6K Reputation points
    2022-08-06T20:38:52.597+00:00

    Check these sample queries:

    -- assigning group numbers to rows  
    select *,   
        (row_number() over (order by TransactDate) - 1) / 1000 + 1 as GroupNumber  
    from MyTable  
      
    -- earliest and latest transaction date for each group  
    ;  
    with Q as  
    (  
        select *,   
            (row_number() over (order by TransactDate) - 1) / 1000 + 1 as GroupNumber  
        from MyTable  
    )  
    select GroupNumber, min(TransactDate) as Earliest, max(TransactDate) as Latest  
    from Q  
    group by GroupNumber  
    
    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Bert Zhou-msft 3,436 Reputation points
    2022-08-08T02:23:59.027+00:00

    Hi,@IgorMuryjas-7562

    Since you didn't provide us with a DDL statement, we can't guess which columns are valid in your table, try the solutions below and let us know in your spare time.

    DECLARE @ntile int  
    SET  @ntile = (SELECT count(1) from yourtable) / 1000  
    ;WITH CTE AS  
        (  
        SELECT Id,NTILE(@ntile) OVER (ORDER BY id) myGroup,Min(TransDate) FirstDate, Max(TransDate) LastDate  
        FROM yourtable  
        GROUP BY Id  
        )  
    SELECT T.ID,C.myGroup,C.FirstDate , C.LastDate  
        FROM yourtable T  
        INNER JOIN CTE C on T.ID = c.ID and t.TranDate = c.LastDate  
    

    Bert Zhou

    0 comments No comments

  2. IgorM 61 Reputation points
    2022-08-08T21:08:08.863+00:00

    Hi,

    thank you for both your replies. Just for completeness please find below DDL statements that create the table and populate it with sample data - apologies, I should have included it with my original question.

    CREATE TABLE [dbo].[TestTable](  
    	RowID				INT IDENTITY (1,1),  
    	[TransactDate]		[datetime] NOT NULL  
    ) ON [PRIMARY]  
    GO  
      
    DECLARE @StartDate datetime = {d '2022-01-01'};  
    DECLARE @EndDate datetime = {d '2022-07-31'};  
    DECLARE @CurrentDate datetime;  
    DECLARE @RowCount INT;  
    DECLARE @NumRows INT;  
      
    SET @CurrentDate = @StartDate;  
    WHILE (@CurrentDate <= @EndDate)  
    BEGIN	  
    		SET @NumRows = CAST(RAND()*1000 AS INT);  
    		SET @RowCount = 1;  
    		WHILE (@RowCount <= @NumRows)  
    			BEGIN  
    				INSERT INTO TestTable (TransactDate) VALUES (@CurrentDate);  
    				SET @RowCount = @RowCount + 1  
    			END  
    		SET @CurrentDate = DATEADD(DAY, 1, @CurrentDate)  
    END  
    

    It seems to me that the answer provided by Viorel-1 does what I need, specifically this query:

    with Q as  
     (  
         select *,   
             (row_number() over (order by TransactDate) - 1) / 1000 + 1 as GroupNumber  
         from MyTable  
     )  
     select GroupNumber, min(TransactDate) as Earliest, max(TransactDate) as Latest  
     from Q  
     group by GroupNumber  
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.