Best approach to load data from table with billions rows

RD 1 Reputation point
2021-09-24T15:35:01.077+00:00

Hi -

We have a sql server table with over billion rows in it with ID as identity column. I have query that filters the table (source) with certain condition and want to load them in another (target) table (all the columns without ID column from source table). Target table has it's own identity ID column. Filtered rows from source table would be @300M rows. what's the best approach to load the data in target table with this condition? Any help will be appreciated

Thanks,

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
14,438 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,689 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Erland Sommarskog 116.5K Reputation points MVP
    2021-09-24T22:01:12.37+00:00

    300 million rows would be a bit more than a mouthful, so you should probably to rows in batches of maybe five million rows at a time. Although the optimal row size depends on lot of things. How wide are the rows? LOB columns? Also how much memory are the in the server etc.

    It is extremely important that the batch follows the clustered index. Here is a pattern you can follow:

    SELECT @minID = MIN(ID) FROM src
    WHILE @min IS NOT NULL
    BEGIN
        SELECT @maxID = MAX(ID)
        FROM  (SELECT TOP (@batchsize)
               FROM src
            ID >= @minID
         AND (otherconditions)
    
      INSERT target (...)
         SELECT ....
         FROM  src
         WHERE ID BETWEEN @minID AND @maxID
           AND  (otherconditions)
    
      SELECT @minID = (SELECT MIN(ID) FROM src WHERE ID > @maxID)
    END
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.