Share via


EXCEPT Function

This section explains how the EXCEPT function handles duplicates.

Removing Duplicates

Consider EXCEPT(S1, S2**)**, where:

S1 = {Kansas, Buffalo, Buffalo, Topeka, Topeka, Wichita, Canada, BC}

S2 = {Buffalo, Topeka, USA, Quebec}

This example assumes that you have created two tables ? S1 and S2 ? that contain the fully qualified member names in the sets S1 and S2, respectively. For more information about how to do this, see Literal Sets.

The following query does the EXCEPT operation:

SELECT Name, Newrank AS Rank
FROM
         (((SELECT Name, Rank FROM S1
         WHERE S1.Name NOT IN (SELECT Name FROM S2))
      RANKTOP 1 LEVELS RANKORDER BY Rank RANKGROUP BY Name)
   RANK ROWS AS NewRank RANKORDER BY Rank)
ORDER BY Rank

To summarize, the query does the following:

  1. Selects the rows of S1 that are not in S2.

  2. The RANKTOP operation groups duplicates into separate groups by using the GROUP BY clause, orders each group by Rank, and then picks up the first one. This ensures that duplicates are eliminated from the tail.

  3. The RANK operation orders the filtered rows by Rank and assigns a new rank to each row so that input ordering is preserved.

Note

You cannot use the EXCEPT operation of SQL-92 because its method for duplicate elimination is not guaranteed to remove duplicates from the tail.

Retaining Duplicates

Consider EXCEPT(S1, S2**,** [ALL]). First, create the tables S1 and S2. Then use the following query to order the duplicates without removing them:

SELECT Name, Newrank AS Rank
FROM
      ((SELECT Name, Rank FROM S1
      WHERE S1.Name NOT IN (SELECT Name FROM S2))
   RANK ROWS AS NewRank RANKORDER BY Rank)
ORDER BY Rank

This is simpler because there is no need to remove duplicates. After the query runs that finds the differences between the tables, the RANK operation generates a rank value for each of the rows of the resulting table. The outer SELECT produces this table, ordering it by the NewRank column, which has been renamed as Rank.

Note

You cannot use EXCEPT ALL from SQL-92 because its semantics are as follows: If a row r appears n times in S1 and m times in S2, then r appears exactly p times in the result, where p is the greater of m ? n and 0. Moreover, the order of these p rows is not preserved.

Generalizing for Sets with Arbitrary Dimensionality

Each tuple has more than one element for these sets. Replace the NAME column in the select list of each SELECT statement with a list of columns (equivalent to the number of dimensions in the input sets). Add as many NOT IN clauses in the WHERE as there are columns, one for each column.