remove duplicate rows

Potter123 260 Reputation points
2023-09-26T02:21:24.0566667+00:00

We have a big table with thousands of rows. There are some duplicate lines. Is there a suitable way to help us remove these duplicate rows without moving other non-duplicate rows?

Any ideas or tutorials? Thank.

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
13,630 questions
0 comments No comments
{count} votes

Accepted answer
  1. PercyTang-MSFT 12,501 Reputation points Microsoft Vendor
    2023-09-26T02:26:08.5866667+00:00

    Hi @Potter123

    You can refer to this document.

    https://www.sqlshack.com/different-ways-to-sql-delete-duplicate-rows-from-a-sql-table/

    If you still have questions after reading the documentation, I suggest that you show some of the data so that we can design targeted deletion statements.

    Best regards,

    Percy Tang

    1 person found this answer helpful.
    0 comments No comments

4 additional answers

Sort by: Most helpful
  1. Olaf Helper 43,901 Reputation points
    2023-09-26T04:31:06.13+00:00

    We have a big table with thousands of rows.

    Thousands of records is not big, it's very small.

    Anyway, without knowing the table design and how you define "duplicate", we can not suggest.

    Please post table design as DDL, some sample data as DML statement and the expected result.

    1 person found this answer helpful.
    0 comments No comments

  2. soumen barua 5 Reputation points
    2023-09-26T07:56:24.24+00:00

    You can write queries that will only return unique rows.

    There are other strategies as well. For example you can create a blank table with a unique clustered index with ignore_dup_key option set. Then insert into that table from main table and duplicate rows will be filtered out.

    Rgds.

    1 person found this answer helpful.
    0 comments No comments

  3. Erland Sommarskog 109.8K Reputation points MVP
    2023-09-26T22:02:47.6733333+00:00

    A common way to remove duplicates is this:

    ; WITH numbering AS (
        SELECT *, rowno = row_number() OVER(PARTITION BY col1, col2, col3 ORDER BY col4 DESC)
        FROM tbl
    )
    DELETE numbering WHERE rowno >1
    

    In the PARTITION BY clause you list the column you wan to be unique after the cleanup. In the ORDER BY clause you select which of these rows you pick. For instance, col4 could be a timestamp column, and you want the most recent column. (Which is why I have DESC in my example.

    1 person found this answer helpful.
    0 comments No comments

  4. Potter123 260 Reputation points
    2023-09-27T03:00:50.5966667+00:00

    Thanks all!

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.