Fuzzy logic in T-SQL

Vishal Chitrala 1 Reputation point
2020-11-02T20:56:43.007+00:00

I would like to implement Fuzzy grouping in TSQL . I have a classID and corresponding class sub groups and need to find any potential duplicates in sub groups and if possible find the %age difference between duplicate values. Can this be achieved in T-SQL.

For example : Class1 has multiple materials and I need to find potential duplicates in it.

36971-image.png

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,624 questions
{count} votes

1 answer

Sort by: Most helpful
  1. MelissaMa-MSFT 24,196 Reputation points
    2020-11-03T01:23:54.487+00:00

    Hi @Vishal Chitrala ,

    Thank you so much for posting here.

    You could be able to use the SOUNDEX() function. This will create a value for the classnames that are similar.

    Please refer below example and check whether it is helpful to you.

    declare @fuzzy table   
    (  
    ClassID varchar(20),  
    Classname varchar(20)  
    )  
      
    insert into @fuzzy values  
    ('Class1','ABC'),  
    ('Class1','CDE'),  
    ('Class1','Matr1'),  
    ('Class1','Matr12'),  
    ('Class1','Matr1234'),  
    ('Class2','123'),  
    ('Class2','456'),  
    ('Class2','ABC'),  
    ('Class2','Matr2'),  
    ('Class2','Matr3'),  
    ('Class3','Matr2'),  
    ('Class3','Matr3'),  
    ('Class3','Matr4')  
      
    select t1.Classid,t1.Classname,  
      t2.Stdname ClassName  
    from @fuzzy t1  
    inner join  
    (  
      select Classname as stdName, snd, rn  
      from  
      (  
        select ClassID, Classname,soundex(Classname) snd,  
      row_number() over(partition by ClassID  
                        order by ClassID,soundex(Classname)) rn  
    from @fuzzy  
      ) d  
      where rn = 1  
    ) t2  
      on soundex(t1.Classname) = t2.snd;  
    

    Output:

    Classid	Classname	ClassName  
    Class1	ABC	ABC  
    Class1	Matr1	Matr2  
    Class1	Matr12	Matr2  
    Class1	Matr1234	Matr2  
    Class2	123	123  
    Class2	456	123  
    Class2	ABC	ABC  
    Class2	Matr2	Matr2  
    Class2	Matr3	Matr2  
    Class3	Matr2	Matr2  
    Class3	Matr3	Matr2  
    Class3	Matr4	Matr2  
    

    Best regards
    Melissa


    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.
    Hot issues October--Users always get connection timeout problem when using multi subnet AG via listener. Especially after failover to another subnet


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.