Compare over 90 million rows of 2 Tables

hamsi hamsi 1 Reputation point
2021-09-28T08:45:30.297+00:00

Hi,

i want compare 2 table contents for each row and column,

question is how can I do this, with function Except is it too slow...

Any ideas ?

SQL Server | Other
0 comments No comments
{count} votes

6 answers

Sort by: Most helpful
  1. hamsi hamsi 1 Reputation point
    2021-09-28T09:57:24.067+00:00

    I will try to give the answer self:

    first i would run a job that concatenate all columns into one string column with special delimiter:

    e.g. "col1value" + "$$--$$--$$" + "col2value" .... into colOverAll ( column)

    and wrote it into the "colOverAll" column, at this point I have ONE String value to
    compare the whole row with another row,
    interesting question here is, how fast is the compare function ....?

    0 comments No comments

  2. Olaf Helper 47,441 Reputation points
    2021-09-28T10:30:24.643+00:00

    The easier (but not more precise) is to calculate a checksum, see https://learn.microsoft.com/en-us/sql/t-sql/functions/checksum-transact-sql?view=sql-server-ver15

    0 comments No comments

  3. Erland Sommarskog 121.9K Reputation points MVP Volunteer Moderator
    2021-09-28T21:43:53.593+00:00

    I would do:

    SELECT ...
    FROM   A
    FULL JOIN B ON A.keycol = B.keycol
    WHERE NOT EXISTS (SELECT A.* INTERSECT SELECT B.*)
    

    Although I don't how much faster than the EXCEPT it will be. I don't think building the string will that quick either, though. It is not really that comparing all the values for a single row that takes the time. It is the number of rows. But having a key that joins them should help.

    0 comments No comments

  4. YufeiShao-msft 7,146 Reputation points
    2021-09-29T03:09:47.633+00:00

    Hi @hamsi hamsi

    If you can compare two tables using LEFT JOIN:

    SELECT *  
    FROM TESTDB.dbo.FirstComTable F  
    LEFT JOIN TESTDB2.dbo.FirstComTable S  
    ON F.ID =S.ID  
    

    The LEFT JOIN T-SQL keyword is used to retrieve data from two tables, by returning all records from the left table and only the matched records from the right table and null values from the right table when there is no matching between the two tables.

    Or using UNION ALL:
    To return the difference between two tables, you can list the columns to compare in the SELECT statement and use these columns in the ORDER BY clause

    Select * from (   
    Select Id_pk, col1, col2...,coln from table1, ‘Old_table’   
    Union all   
    Select Id_pk, col1, col2...,coln from table2, 'New_tbale'   
    ) cmpr order by Id_pk;  
    
    0 comments No comments

  5. hamsi hamsi 1 Reputation point
    2021-09-29T09:07:00.573+00:00

    Hi guys,

    thanks for all replies.

    My main question is target to the performance, I know it exists a lot of SQL possible queries to

    detect differences, main question here is, which query had the best Performance ?

    Otherwise I can take a part of datas e.g. 2 Million and can stop the time to detect which query had the best Performance....


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.