SQL Regex Pattern Matching Help

Carlton Patterson 41 Reputation points
2022-09-07T13:09:59.783+00:00

Hello Community,

Can someone assist in SQL pattern matching.

I have two tables company and organisations.

In the organsations table there is a field 'cbname'. I would like to find a 85% match between the field 'cbname' and the field in company table called 'companyname'.

I would then like the results to be added to a field in another table, say 'results_table', with a field called 'matched'. The expected output would look something like the following:

238619-matching.png

Sample data is as follows:

Sample Data for organisations table

CREATE TABLE organisations (  
    cbname nvarchar(100),  
    cblegal_name nvarchar(100))  
  
INSERT organisations VALUES  
(N'sevenload',N''),  
(N'Careerjet',N'Careerjet'),  
(N'Gimahhot GmbH',N''),  
(N'RBC Venture Partners',N''),  
(N'DoApp',N''),  
(N'Fidelity Equity Partners',N''),  
(N'Welcome',N'Welcome Software'),  
(N'ExpertFlyer',N''),  
(N'Mento',N''),  
(N'uLiken',N''),  
(N'Bravisa',N''),  
(N'Media Ventures',N'Media Ventures GmbH'),  
(N'Beam It Up Scotty',N''),  
(N'FuelClinic',N''),  
(N'British Interactive Media Association',N''),  
(N'dotMobi',N''),  
(N'NET X AMERICA',N''),  
(N'Pogopixels',N''),  
(N'allyve',N''),  
(N'Boardwalktech',N'Boardwalktech, Inc.'),  
(N'The Economist Group',N''),  
(N'amazingtunes',N''),  
(N'hosting365',N''),  
(N'Panorama',N'Panorama Software'),  
(N'Your Survival',N''),  
(N'Koollage',N''),  
(N'Rosum',N''),  
(N'TruePosition',N'TruePosition Inc.'),  
(N'Wetpaint',N''),  
(N'Zoho',N'Zoho Corporation Pvt. Ltd.'),  
(N'Digg',N'Digg Holdings, LLC'),  
(N'Omidyar Network',N''),  
(N'Meta',N'Meta Platforms, Inc.')  
  
SELECT * FROM organisations  

Sample data for companyname table

CREATE TABLE company (  
    companyname nvarchar(100))  
  
INSERT company VALUES  
(N'Kantar Group'),  
(N'Box-it'),  
(N'Wooribank'),  
(N'TMT Finance News'),  
(N'CLG'),  
(N'Laing O''Rourke'),  
(N'BT - Openreach'),  
(N'Motorola'),  
(N'Broadcast Australia'),  
(N'The Bank of East Asia'),  
(N'Residential Research'),  
(N'Tongyang LIfe'),  
(N'Gothaer Insurance Company'),  
(N'InterPark'),  
(N'Hastings'),  
(N'Meiji Yasuda Life Insurance Company'),  
(N'Nong Hyup'),  
(N'Mitsubishi Corp'),  
(N'Wyoming Retirement System'),  
(N'Zenkyoren'),  
(N'Sumitomo Mitsui Asset Management Company'),  
(N'Fukoku Mutual Life Insurance Company'),  
(N'Taiyo Life Insurance Company'),  
(N'Sumitomo Life'),  
(N'Summit Strategies'),  
(N'Altius Associates Limited'),  
(N'AC Agrar GmbH & Co. KG'),  
(N'Nutritics')  
  
SELECT * FROM company  

As always, your help is much appreciated.

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,552 questions
{count} votes

Accepted answer
  1. Erland Sommarskog 100.9K Reputation points MVP
    2022-09-07T21:36:56.063+00:00

    The only built-in option for this in SQL Server is the difference() function, which I will have to admit that I don't have much faith in. It's based on soundex values, if that tell you something.

    This sort of advanced matching is best performed outside SQL Server, although, if you have the machine-learning services installed you could send the data to a Python script from SQL Server.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. CosmogHong-MSFT 22,781 Reputation points Microsoft Vendor
    2022-09-08T07:57:43.597+00:00

    Hi @Carlton Patterson
    Agree with Erland, although there is a Difference function in SQL Sever to get the similarity of two strings, it is sometimes not so efficient.
    For example, in the following sample, although they are all one letter apart, SQLsever believes that Hm&Him is more similar than Hi&Him.

    SELECT DIFFERENCE('Hi','Him') AS Hi_Him  
    	  ,DIFFERENCE('Hm','Him') AS Hm_Him  
    

    Output:
    238962-image.png

    0 comments No comments