Share via


Better Approach to avoid DISTINCT/GROUP BY

Question

Wednesday, April 4, 2012 9:39 PM

I have a table with three column

ID              Brand               Price

1               GE                     20
2               GE                     21
3               Sony                  21

Select Distinct Brand From TABLE where price between 19 and 25

if I dont do distinct Then i will see GE twice..my table has around 30K Rows....so what is the best approach to go around this issue (best performance) ..I dont need Group by VS Distinct solution...I need to know best practice to avoid these issues.

www.SaqibKhan.Com

All replies (16)

Wednesday, April 4, 2012 9:45 PM

In this case DISTINCT seems better. 

Please check following link from Pinal Dave 

http://blog.sqlauthority.com/2007/03/29/sql-server-difference-between-distinct-and-group-by-distinct-vs-group-by/

Best Wishes, Arbi; Please vote if you find this posting was helpful or Mark it as answered.


Wednesday, April 4, 2012 9:46 PM

I believe both solutions - yours and

SELECT Brand from Table where Price between 19 and 25 GROUP BY Brand 

will perform the same.

For every expert, there is an equal and opposite expert. - Becker's Law

My blog


Wednesday, April 4, 2012 9:48 PM

sorry, but question was NOT to advise which is better (Group by VS Distinct)...question is how to avoid these bad table design patterns and get best performance... I dont want to go through all 30K rows and return DISTINCT on them...if this is THE BEST PRATICE of doing it then fine...otherwise please advise.

www.SaqibKhan.Com


Wednesday, April 4, 2012 9:52 PM

;With CTE 
As
(
Select *, Row_Number() Over(Partition by Brand Order by Price Desc) As RowNo  
From yourTable 
)
Select * From CTE Where RowNo > 1 

This will list all the duplicates for you. Depends on your business logic you can delete dups or add it up or whatever you want to do with those. 

In the next step, you can create a UNIQUE index based on brand or have trigger which will check if that BRAND is duplicate or not and will through an error to UI. 

Best Wishes, Arbi; Please vote if you find this posting was helpful or Mark it as answered.


Wednesday, April 4, 2012 9:59 PM

Thank you Arbi for suggestion...however problem is we get data from a Third Service...which populates this table automatically & on our website we need DISTINCT data. I cant simply delete rows on the fly..since search queries are based upon PRICE and BRAND...I was hoping maybe a approach or a tip to go around this issue.

www.SaqibKhan.Com


Thursday, April 5, 2012 1:06 AM

You're probably not going to be a very happy SQL Developer if you think you never should do SELECT DISTINCT or SELECT GROUP BY.

At the same time, the example (and I realize it's just an example) seems overly simplistic:  GE is a big company.  Are you pulling in Alarm Clocks?  Jet Engines?   Nuclear Power Plants?  You might have fewer duplicates if you were doing a more meaningful query, such as grouping by brand, product categories, and model numbers, perhaps also showing MIN, MAX, and Average pricing.  But to get back to your original question, having to do a Select Distinct or Group by or other form of aggregation isn't a sign of bad design when you have to do summary queries.. 


Thursday, April 5, 2012 3:37 AM

Since you only want distinct brands within a particular price range, I would just go with DISTINCT.  If your concern is performance, DISTINCT should not be too costly on only 30,000 rows.  I just did a test on 32,000+ rows on a table I have with a query like yours (Select Distinct <Column Name> From <Table Name> Where <Some Other Column> Between <Some Value> And <Some Other Value>.  That query took 17ms on my machine. 

Tom


Thursday, April 5, 2012 4:00 AM

CREATE TABLE  #abc
(
id INT,
brand VARCHAR(10),
price MONEY
)

INSERT INTO #abc
VALUES
(1,'GE',20),
(2,'GE',21),
(3,'SONY',21)

;WITH CTE
As
(
SELECT *,ROW_NUMBER() OVER (PARTITION By Brand ORDER BY Price) AS RowNo FROM #abc
)   
SELECT ID,brand,price FROM CTE WHERE RowNo = 1

Thursday, April 5, 2012 7:17 AM

The best approach if you do not want to use DISTINCT & GROUP BY (Both will be executed in exactly the same way) IMO would be - 

  1. Create a Master Table for Brands.
  2. Just save the BrandID in your above table.
  3. And to select the DISTINCT brands, you just need to do the following - 
SELECT BrandName from Brands WHERE BrandID IN (SELECT BrandID FROM PriceTable WHERE Price BETWEEN 19 AND 25)

However, if this is not possible in your environment just because your data in dumped into your tables by a 3rd party webservice, then you can also go for DISTINCT your data @ your front end..i.e. in your case, it would be your application server. Please note, that this will free your DBServer from the extra effort but the same will be shifted to your AppServer. Just see if this works in your case.

Hope, this helps!

-Vinay Pugalia
If a post answers your question, please click "Mark As Answer" on that post or "Vote as Helpful".
Web : Inkey Solutions
Blog : My Blog
Email : Vinay Pugalia


Thursday, April 5, 2012 10:28 AM

according to your comment. I have mentioned three solutions. might be helpfull

Declare @Table table(id int ,brand varchar(5),val money)

insert into @Table
values (1,'GE',20),(2,'GE',21),(3,'Sony',21)

select distinct brand from @Table where val between 19 and 25

select max(id),brand from @Table where val between 19 and 25 group by brand

;with dup
as
(
select rank() over(partition by brand order by val) as dup,brand from @Table
)
select * from dup where dup=1

Shamas Saeed (if Post helpful please mark as Answer) http://shamas-saeed.blogspot.com


Thursday, April 5, 2012 11:38 AM

SQL has a handy Top(1) With Ties for that

CREATE TABLE  #abc
(
id INT,
brand VARCHAR(10),
price MONEY
)

INSERT INTO #abc
VALUES
(1,'GE',20),
(2,'GE',21),
(3,'SONY',21),
(4,'ABC',25)

SELECT Top(1) With Ties * 
FROM #abc
WHERE price is between 19 and 21 
Order By ROW_NUMBER() OVER (PARTITION By Brand ORDER BY Price)

Serg


Thursday, April 5, 2012 12:36 PM

I don't know why you want to avoid distinct or Group by. If you further wants to reduce the cost , then I would advise you to create a non clustered index on brand column with price as included column

Thanks and regards, Rishabh , Microsoft Community Contributor


Friday, April 6, 2012 10:06 AM

I believe that both are work fine :

SELECT Brand from Table where Price between 19 and 25 GROUP BY Brand

SELECT Distinct Brand from Table where Price between 19 and 25

Pratik V Patel


Thursday, April 12, 2012 2:00 PM

SELECT Brand from Table where Price between 19 and 25 GROUP BY Brand

SELECT Distinct Brand from Table where Price between 19 and 25

Point is though, you cannot use aggregate functions with DISTINCT.

GROUP BY article:

http://www.sqlusa.com/bestpractices2008/groupbystatement/

Kalman Toth SQL SERVER & BI TRAINING


Thursday, April 19, 2012 6:34 PM

Any progress?

Kalman Toth SQL SERVER & BI TRAINING


Monday, April 23, 2012 12:55 PM

Why we need to use Distinct. If we can get solution without using it. like 

select max(id),brand from @Table where val between 19 and 25 group by brand

Shamas Saeed (if Post helpful please mark as Answer) http://shamas-saeed.blogspot.com