How to add Percentage of similar compare two string ?

ahmed salah 3,216 Reputation points
2022-11-01T22:54:04.123+00:00

I work on sql server 2019 i have issue i can't get percentage of two string from table

so i will compare between two string columns (PartText,MaskText)
and i will display similar percentage of two column

so as sample 'TR00123907FG','KB00123907FG' two string different only on first two charachters

so difference will be 80 percent

create table #compareTextPercentage  
(  
PartText varchar(50),  
MaskText varchar(50)  
)  
insert into #compareTextPercentage(PartText,MaskText)  
values  
('TR00123907FG','KB00123907FG'),  
('TR00123907FG','TR00123907FG'),  
('klmhedf43122','50ghlpnkyzhy')  

expected result

256226-image.png

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,690 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,552 questions
0 comments No comments
{count} votes

Accepted answer
  1. Yitzhak Khabinsky 24,946 Reputation points
    2022-11-02T01:16:45.707+00:00

    Hi @ahmed salah ,

    There is a built-in SQL Server function: DIFFERENCE

    DIFFERENCE compares two different SOUNDEX values, and returns an integer value. This value measures the degree that the SOUNDEX values match, on a scale of 0 to 4. A value of 0 indicates weak or no similarity between the SOUNDEX values; 4 indicates strongly similar.

    Check it out a solution below.
    I used 25 as a coefficient. You can try any other suitable number or even a formula to correlate for your needs.

    SQL

    DECLARE @tbl table (ID INT IDENTITY PRIMARY KEY,  PartText varchar(50),  MaskText varchar(50));  
    INSERT INTO @tbl (PartText,MaskText) VALUES  
     ('TR00123907FG','KB00123907FG'),  
     ('TR00123907FG','TR00123907FG'),  
     ('klmhedf43122','50ghlpnkyzhy');  
      
    SELECT *   
     , Diff = DIFFERENCE(PartText, MaskText) * 25  
    FROM @tbl;  
    

    Output

    +----+--------------+--------------+------+  
    | ID |   PartText   |   MaskText   | Diff |  
    +----+--------------+--------------+------+  
    |  1 | TR00123907FG | KB00123907FG |   50 |  
    |  2 | TR00123907FG | TR00123907FG |  100 |  
    |  3 | klmhedf43122 | 50ghlpnkyzhy |    0 |  
    +----+--------------+--------------+------+  
    
    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. daisuke yanagi 1 Reputation point
    2022-11-02T02:34:41.517+00:00

    Hi @ahmed salah ,

    Sorry.
    I used it for the first time and listed it in the wrong place.
    How about this query?

    SQL

    DECLARE @tbl table (PartText varchar(50),  MaskText varchar(50))  
    INSERT INTO @tbl (PartText,MaskText) VALUES  
    ('TR00123907FG','KB00123907FG'),  
    ('TR00123907FG','TR00123907FG'),  
    ('klmhedf43122','50ghlpnkyzhy');  
      
    with num as (  
    	select 1 + n1.value * 10 + n2.value as n   
    	from   
    	string_split('0,1,2,3,4,5,6,7,8,9',',') as n1  
    ,string_split('0,1,2,3,4,5,6,7,8,9',',') as n2  
    ),temp as (  
    	select distinct  
    	PartText  
    	,MaskText  
    	,TextCnt  
    	,sum(iif(SUBSTRING(PartText,n,1) =SUBSTRING(MaskText,n,1) ,1,0) ) over(partition by IdText) as CompCnt  
    	from @tbl as tbl   
    	cross apply(  
    	select n,max(n) over() TextCnt,concat(PartText,MaskText)IdText from num where n <= len(iif(PartText>MaskText,PartText,MaskText))  
    	) as n  
    )  
      
    select   
    PartText  
    ,MaskText  
    ,round(cast(CompCnt as float) / TextCnt * 100,-1,1) as Precentage  
    from temp  
    order by 1 desc;  
    

    output

     +--------------+--------------+---------+  
     |   PartText   |   MaskText   | PartText|  
     +--------------+--------------+---------+  
     | klmhedf43122 | 50ghlpnkyzhy |       0 |  
     | TR00123907FG | KB00123907FG |      80 |  
     | TR00123907FG | TR00123907FG |     100 |  
     +--------------+--------------+---------+  
    

  2. NikoXu-msft 1,911 Reputation points
    2022-11-02T02:27:24.497+00:00

    Hi @ahmed salah ,

    If you want to get the percentage similarity of 2 strings, you could have a look at this similar thread that provides a method in T-SQL.

    I've tried it and it works pretty well.

    CREATE FUNCTION [dbo].[GetPercentageOfTwoStringMatching]  
    (  
        @string1 NVARCHAR(100)  
        ,@string2 NVARCHAR(100)  
    )  
    RETURNS INT  
    AS  
    BEGIN  
      
        DECLARE @levenShteinNumber INT  
      
        DECLARE @string1Length INT = LEN(@string1)  
        , @string2Length INT = LEN(@string2)  
        DECLARE @maxLengthNumber INT = CASE WHEN @string1Length > @string2Length THEN @string1Length ELSE @string2Length END  
      
        SELECT @levenShteinNumber = [dbo].[LEVENSHTEIN] (   @string1  ,@string2)  
      
        DECLARE @percentageOfBadCharacters INT = @levenShteinNumber * 100 / @maxLengthNumber  
      
        DECLARE @percentageOfGoodCharacters INT = 100 - @percentageOfBadCharacters  
      
        RETURN @percentageOfGoodCharacters  
      
    END  
      
      
    CREATE FUNCTION [dbo].[LEVENSHTEIN](@left  VARCHAR(100),  
                                        @right VARCHAR(100))  
    returns INT  
    AS  
      BEGIN  
          DECLARE @difference    INT,  
                  @lenRight      INT,  
                  @lenLeft       INT,  
                  @leftIndex     INT,  
                  @rightIndex    INT,  
                  @left_char     CHAR(1),  
                  @right_char    CHAR(1),  
                  @compareLength INT  
      
          SET @lenLeft = LEN(@left)  
          SET @lenRight = LEN(@right)  
          SET @difference = 0  
      
          IF @lenLeft = 0  
            BEGIN  
                SET @difference = @lenRight  
      
                GOTO done  
            END  
      
          IF @lenRight = 0  
            BEGIN  
                SET @difference = @lenLeft  
      
                GOTO done  
            END  
      
          GOTO comparison  
      
          COMPARISON:  
      
          IF ( @lenLeft >= @lenRight )  
            SET @compareLength = @lenLeft  
          ELSE  
            SET @compareLength = @lenRight  
      
          SET @rightIndex = 1  
          SET @leftIndex = 1  
      
          WHILE @leftIndex <= @compareLength  
            BEGIN  
                SET @left_char = substring(@left, @leftIndex, 1)  
                SET @right_char = substring(@right, @rightIndex, 1)  
      
                IF @left_char <> @right_char  
                  BEGIN -- Would an insertion make them re-align?  
                      IF( @left_char = substring(@right, @rightIndex + 1, 1) )  
                        SET @rightIndex = @rightIndex + 1  
                      -- Would an deletion make them re-align?  
                      ELSE IF( substring(@left, @leftIndex + 1, 1) = @right_char )  
                        SET @leftIndex = @leftIndex + 1  
      
                      SET @difference = @difference + 1  
                  END  
      
                SET @leftIndex = @leftIndex + 1  
                SET @rightIndex = @rightIndex + 1  
            END  
      
          GOTO done  
      
          DONE:  
      
          RETURN @difference  
      END   
      
      SELECT  *,[dbo].[GetPercentageOfTwoStringMatching](MaskText  ,'TR00123907FG') as[percent] from #compareTextPercentage  
    

    output:
    256229-case112.png

    Best regards
    Niko

    ----------

    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    2 people found this answer helpful.