To eliminate duplications on some fields

BenTam 1,561 Reputation points
2023-01-28T07:54:11.37+00:00

Dear All,

I have a table as shown below. You can see some duplications. My question is how to eliminate the duplications. The same question was asked on https://learn.microsoft.com/en-us/answers/questions/1163799/how-to-eliminate-duplication. Now the situation is different, there are a lot of data (over 93000 rows) and many more columns.

The Data:CourseTables

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,700 questions
{count} votes

Accepted answer
  1. Jingyang Li 5,891 Reputation points
    2023-02-02T16:02:00.51+00:00

    You can study WINDOW functions to solve this issue.

    
    
    
    	; WITH mycte AS (
        SELECT courseid, sjcode, courseno, sjversion,trainerid, tutorid, remark, students,
    	 row_number()Over(PARTITION BY sjcode, courseno, sjversion ORDER BY newid()) rn
       FROM  course
    )
    SELECT courseid, sjcode, courseno, sjversion,trainerid, tutorid, remark,students
    FROM  mycte
    WHERE rn = 1
    
    1 person found this answer helpful.

2 additional answers

Sort by: Most helpful
  1. Erland Sommarskog 100.9K Reputation points MVP
    2023-01-28T10:21:43.5866667+00:00
    ; WITH numbering AS (
        SELECT ...., rowno = row_number(PARTITION BY col1, col2, col3 ORDER BY col4)
       FROM  tbl
    )
    SELECT ... 
    FROM  numbering
    WHERE rowno = 1
    
    

    In the PARTITION BY clause you list the columns where you have duplicates. In the ORDER BY clause you give a criteria for which row of the duplicates to pick. For instance, if you want the most recent row by createdate you would say ORDER BY createdate DESC. If you don't care, you can say ORDER BY (SELECT 1).


  2. PercyTang-MSFT 12,426 Reputation points Microsoft Vendor
    2023-01-30T06:12:08.89+00:00

    Hi

    Originally, I wanted to design such a query for you.

    select max(courseid),sjcode,courseno,sjversion,
           max(trainerid),max(tutorid),max(remark),max(students) 
    	   from course group by sjcode,courseno,sjversion;
    

    But your remark field is of type text and cannot use the max/min function.

    So Erland's approach is more suitable here.

    You should modify your code based on the example he provides.

    Best regards,

    Percy Tang