Several years ago I implemented a customer search index, a N-Gram Search Index. Works well & fast even with large data (in my case more then 13 million addresses), but requires a lot of additional data (space), see
SQL prefix wildcard search on very large table
Hi,
I am looking for the best practice in finding text with a prefi wildcard within a very large table.
To give you an idea, the product table has approx. 2 million products. It has a cluster as non-cluster indexes.
Example of possible values:
- Fountain
- Fountaincap
- Fountain in my garden
- Frozenfountain
- Dirt in my fountain water
- Frozenfountain2
- Fountai with a typo
- There is no better Fountain
The following statement takes to much time, more than 30 sec.
SELECT * FROM Products WHERE ProductName LIKE '%Fountain%'
I was thinking of using Full Text Search service but prefix wildcards are not allowed here.
SELECT * FROM Products WHERE CONTAINS(ProductName,'Fountain*')
So, any ideas of making search queries much faster would be much apricated.
All ideas are welcome, if possible would out additional software to prevent dependencies.
Thank you in advance
Developer technologies | Transact-SQL
6 answers
Sort by: Most helpful
-
-
Erland Sommarskog 121.6K Reputation points MVP Volunteer Moderator
2021-01-12T23:07:55.547+00:00 Olaf's idea seems to be the best. I just gave it a quick look, but it seems to be the same idea I once wrote for a book quite a few years ago.
What I like to add that if you are using LIKE, you can reduce the execution time if you can use varchar and an SQL collation. With varchar, an SQL collation only has 255 characters, and the comparison rules are a lot simpler than the full Unicode monty that you get with a Windows collation.
Also, an index on the column helps if there are lot of other big columns in the table, so there is less data to scan.
But it will still take several seconds. If you build your index as Olaf suggest, you can cut it to sub-second.
-
natig gurbanov 1 Reputation point
2021-01-12T14:16:01.343+00:00 use
unique clustered indexed view for easy working with large data -
Viorel 122.6K Reputation points
2021-01-12T16:02:47.26+00:00 If you want to research your own hand-made approaches, then maybe create a helper table (@sequences) like this:
declare @table table ( id int, txt varchar(max) ) declare @sequences table ( id int, seq varchar(max), rev bit ) insert @table values ( 1, 'Fountain ' ), ( 2, 'Fountaincap ' ), ( 3, 'Fountain in my garden ' ), ( 4, 'Frozenfountain ' ), ( 5, 'Dirt in my fountain water ' ), ( 6, 'Frozenfountain2 ' ), ( 7, 'Fountai with a typo ' ), ( 8, 'There is no better Fountain' ) ; with W1 as ( select id, trim([value]) as word from @table cross apply string_split( trim(txt), ' ') ), W2 as ( select id, word as seq, 0 as rev from W1 union select id, reverse(word), 1 from W1 ), S as ( select id, seq, rev from W2 union all select id, right(seq, len(seq)-1), rev from S where len(seq) > 1 ) insert @sequences select * from S option (maxrecursion 0) select * from @sequences order by id, rev
This sample uses local tables, but the data can be stored to some real table and indexed for 'xxx%' searches. Then, in order to find ‘%Fountain%’, consider this query:
declare @to_find varchar(max) = 'Fountain' select distinct t.* from @table t inner join @sequences s on s.id = t.id where (s.rev = 0 and s.seq like @to_find + '%') or (s.rev = 1 and s.seq like reverse(@to_find) + '%')
which probably can be further optimized.
The approach is less suitable if the contents is frequently changed.
-
Tom Phillips 17,771 Reputation points
2021-01-12T18:56:52.173+00:00 A table with 2 million rows is not really that many. Trying to pre-optimize this query may not result in any performance increase. If you have enough RAM, this will all be cached.
What is the data type of ProductName? What version of SQL Server are you using?