Retrieve data from Billions of rows

Vivek D 1 Reputation point
2022-05-10T04:16:41.16+00:00

Hi,

I am trying to run query from a very big table that has billions of rows. I am looking for aggregating data along with some other columns in the select statement. Since its a group by operation, I have to add all the columns in the group by clause. Is there any alternative/faster way to do it?

E.x.

Select column1, column2, column3, column4, column5, column6, sum(DollarAmount) from dbo.table
Group by column1, column2, column3, column4, column5, column6

Any help will be greatly appreciated!

Thank you!

Developer technologies Transact-SQL
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. LiHong-MSFT 10,056 Reputation points
    2022-05-10T05:54:28.113+00:00

    Hi @Vivek D
    Have you tried adding Indexes,like this:

    CREATE /*UNIQUE*/ NONCLUSTERED INDEX ix_DollarAmount ON TableName (column1, column2, column3, column4, column5, column6,DollarAmount)   
    

    In addition,if your aggregate functions are MAX() or MIN(), you could also consider using Ranking Functions.
    Check this sample:

    CREATE TABLE #temp(COL_A INT,COL_B INT,COL_C VARCHAR(10))  
    INSERT INTO #temp VALUES(1,111,'abc'),(1,122,'def'),(1,133,'ghi'),(2,222,'jkl'),(2,211,'mno'),(3,333,'pqr')  
    --Solution 1  
    SELECT T1.*  
    FROM #temp T1 JOIN (SELECT COL_A,MAX(COL_B)COL_B  
                        FROM #temp  
                        GROUP BY COL_A )T2 ON T1.COL_A=T2.COL_A AND T1.COL_B=T2.COL_B  
    --Solution 2  
    ;WITH CTE AS  
    (  
     SELECT *,ROW_NUMBER()OVER(PARTITION BY COL_A ORDER BY COL_B DESC) AS RowNum  
     FROM #temp  
    )  
    SELECT COL_A,COL_B,COL_C  
    

    Best regards,
    LiHong


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
    Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  2. Naomi Nosonovsky 8,431 Reputation points
    2022-05-10T14:28:22.713+00:00

    You can use SUM(Column) over (partition by ...) as Total, so you would not need to group by all columns, but the performance most likely be the same, you may want to try both variations.

    0 comments No comments

  3. Tom Phillips 17,771 Reputation points
    2022-05-10T14:47:22.203+00:00

    For columns which are the same for the same item, you can use MAX(colname) to shorten the group by

    Assuming column1+column2 are the key fields

    Select column1, column2, max(column3) as column3, max(column4) as column4, max(column5) as column5, max(column6) as column6, sum(DollarAmount) from dbo.table
    Group by column1, column2
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.