DISTINCTROW example correct ??

Question

DISTINCTROW example correct ??

Anonymous

I'm following the DISTINCTROW example at http://support.office.com/en-sg/article/ALL-DISTINCT-DISTINCTROW-TOP-Predicates-24f2a47d-a803-4c7c-8e81-756fe298ce57.

I created the following tables:

orders

------

ID company order

2 1 A

3 2 A

4 1 B

companies

---------

ID company

1 1

The following causes company '1' to show up twice:

SELECT DISTINCTROW orders.company

FROM orders

INNER JOIN companies

ON orders.company=companies.company

That doesn't match what the help page says, even though it does what I would expect a qualifier named "DISTINCTROW" would do. Maybe it's just an inappropriate example, or is there something more fundamental that I'm missing here?

For example, if I use DISTINCT instead of DISTINCTROW, I get rid of redundant instances of company (as expected). This begs the question of what situation benefit from the use of DISTINCTROW.

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.

0 comments

Answer accepted by question author

Anonymous

Referencing table and referenced table are terms used frequently in the academic literature concerning the database relational model. I tend use them rather than more colloquial terms because they unambiguously describe the nature of the table in a relationship.

When you say '..........the join field values are non-repeating in one of the two tables, but not necessarily' what you are describing is a many-to-many relationship type, usually binary but often ternary or occasionally greater. This exists conceptually between two or more entity types, e.g. between Orders and Products, but should never be modelled in this way, always by a third table, e.g. OrderDetails which resolves the relationship type into two or more one-to-many relationship types. This is the only way the relationship type can be enforced of course.

Whether the use of the DISTINCTROW predicate rather than a subquery would enhance performance I can't say, but in a DELETE query it's unnecessary, even where a JOIN is used. The query is not independently deleting each returned instance of a single row, but the row per se. Even that is not really the case as the whole 'query' (called so in Access, but in SQL-speak not really so) is a set operation of course. In a SELECT query I'm not aware of any benchmarking with regard to its use vis-à-vis a subquery. It's worth noting, however, that the EXISTS predicate is very efficient because, by allowing the use of the * wildcard character in the subquery, it gives the optimizer full rein.

0 comments

Answer accepted by question author

Anonymous

The help article is more or less correct, but not very well worded and far from complete. The reason you are getting duplicate rows is because you are returning a column from the referencing table, whereas the example in the article returns a column only from the referenced table. The article alludes to this with 'DISTINCTROW is ignored if your query includes only one table, or if you output fields from all tables', but instead of 'all tables' it really should say 'the referencing table'.

You can see the difference, using Northwind, with:

SELECT DISTINCTROW Company

FROM Customers INNER JOIN Orders

ON Customers.ID = Orders.[Customer ID]

ORDER BY Company;

and:

SELECT DISTINCTROW Company, [Customer ID]

FROM Customers INNER JOIN Orders

ON Customers.ID = Orders.[Customer ID]

ORDER BY Company;

The former suppresses the duplication, whereas the latter does not. This behaviour gives a clue to the purpose of the DISTINCTROW predicate. It stems from the early days of Access when it did not support subqueries. The DISTINCTROW predicate was a means of getting around this by providing a means of returning a single instance of each row from the referenced table where there are multiple matching rows in the referencing table. The same result would be obtained with the DISTINCT predicate of course, but the crucial difference is that the DISTINCTROW predicate would return an updatable result table, whereas the DISTINCT predicate would not.

With the introduction of support for subqueries the DISTINCTROW predicate was rendered pretty much redundant many years ago. It does nothing in your corrected DELETE query and can be omitted.

An alternative for what you want to do would be:

DELETE *

FROM Orders

WHERE NOT EXISTS

(SELECT *

FROM Customers

WHERE Customers.Company = Orders.Company);

0 comments

3 additional answers

Answer 1

I have to admit to never taking formal relational database *theory* before, though I've done a coding course and read up on entity-relationship diagram variants (in the context of ontologies, and I find those concepts to be very theoretical). I've seen the distinction between one-to-many and many-to-many in other academic contexts. I think my roadblock here is the lack of visibility into how DELETE works, which might help me understand your phrasing "not independently deleting each returned instance of a single row, but the row per se". But I'm OK with not knowing all the details. And I get what you say about the wildcard making the subquery much more efficient than otherwise.

About the non-necessity of DISTINCTROW, my experimentation with DELETE seems to indicate that DISTINCTROW is needed. I reconstructed the toy example:

orders

------

ID company order

5 1 A

6 1 B

7 2 A

8 2 B

companies

---------

ID company

1 1

The DISTINCTROW key word is needed in the following 2 slightly different queries:

DELETE DISTINCTROW orders.*

FROM orders INNER JOIN companies

ON orders.company=companies.company

WHERE companies.company=1;

DELETE DISTINCTROW orders.*

FROM orders LEFT JOIN companies

ON orders.company=companies.company

WHERE companies.company IS NULL;

Without it, I get the error message "Could not delete from specified tables."

Answer 2

Ken,

Thank you for your explanations. I installed the Northwind example, but found it a bit big for comprehension of the details, so I referred to an example at http://www.fmsinc.com/microsoftaccess/query/distinct_vs_distinctrow/unique_values_records.asp. That merely confirmed to me what I suspected about how DISTINCTROW works. I'm just not sure why it is needed, even in your example.

I was a bit confused by the term referencing and reference tables. I try to think of an INNER JOIN as a symmetric operation, where *most* of the time, the join field values are non-repeating in one of the two tables, but not necessarily. I am assuming that the table with the non-repeating values in the join field is the reference[d] table. With that in mind, your explanation matches the picture described in the above link.

You ended by explaining how a correlated subquery makes DISTINCTROW unnecessary. However, my (admittedly incomplete) knowledge about SQL is that correlated queries are individual queries run for each record of the outer query. Would I be correct, then, in assuming that it is slower than a JOIN with a DISTINCTROW?

I feel that I know how DISTINCTROW behaves in a SELECT query, but for a DELETE query, the (possibly slower) correlated subquery is way more understandable. No, scratch that, it's simply understandable -- because for the case without correlated subquery, I still don't see how DISTINCTROW is intended to work in a DELETE query. So I'm a bit paranoid about relying on it for DELETEs, even if it is faster (which is not to imply that this is in fact true).

Maybe the reason why it is needed is as follows. Say you are aiming to delete records in a referencing table, and imagine that up to "n" records in the referencing table match a join field value in the reference table. I surmise that you need all those matches in order to delete all the matching records in the referencing table. If this is true, then that would explain why you need the duplicates from DISTINCTROW. And the documentation for DISTINCTROW in no ways conveys this. As well, that picture is only helpful if the so-called reference table indeed has non-repeating values in the join field.

Answer 3

Where I'm trying to go with the above is that eventually, I want something like:

DELETE DISTINCTROW orders.*

FROM orders LEFT JOIN companies

ON orders.company=companies.company

WHERE orders.company IS NULL

As written, however, the above code does not delete records in orders that lack a match with a record companies. I thought that was the whole point of a left join with a check for null in the left table's join field.

Afternote: I got the above coding idiom wrong. The WHERE clause should check the condition companies.company IS NULL. *That* is the idiom for a LEFT JOIN. I'm keeping this post up in case another person falls into a coma like I did and got the idiom backward (maybe save someone just a bit of time).

However, I'd still be interested in an answer to the original post about a practical illustration of the use of DISTINCTROW.

Share via

DISTINCTROW example correct ??

3 additional answers