Extract Strings from a Bad Data Using CTE / Pattern Matching using MS SQL - 2018

Dhanekula, Shashikanth 21 Reputation points
2022-08-30T04:44:41.623+00:00

I am trying to extract the result strings from the Bad Data column (Please See Attached).

Issues Faced:

The problem is the separator is not consistent.
I cannot use "Case statements" because a new pattern might be added in the future.
Probable solution: I know using CTE could be the solution but I am not very well versed in using CTE.

Probable example for CTE code: https://sqlserverfast.com/blog/hugo/2019/04/removing-multiple-patterns-from-a-string/

Any help is appreciated.

Thank you

Sample data:235971-bad-data.png

Sample code:

CREATE TABLE [Sample_Data]  
(  
 [PK] int IDENTITY(1,1) PRIMARY KEY,  
 [id] [varchar](255) NULL,  
 [Bad_data] [varchar](200) NULL,  
)  
GO  
  
INSERT INTO [Sample_Data]  
           ([id]  
           ,[Bad_data])  
     VALUES  
           ('1','LTY - REEL - Mark Hudson - 999.265.6233'),  
 ('23','IOO Craig David 123-456-7891'),  
 ('28','LTK - MEKL - MIKAL MEKL - 123.558.1234'),  
 ('36','RAW- RANDY JHON- 123-456-7891'),  
 ('2','MNQR- JERRY RAM 123-456-7891'),  
 ('I12','LTK - MEKL - TERRANCE LAWRENCE - 123.456.7891'),  
 ('L-324','LTK - MEKL - Joseph Rick - 1234567891'),  
 ('JKO','MDKL JERRY RAM 123 456 7891'),  
 ('9','MUINK- OSCAR PISTARI- 123-456-7891'),  
 ('10','MUINK-Leo David'),  
 ('11','ALT - Stephen Smith  123-456-7891')  
  
GO  
  
Developer technologies Transact-SQL
SQL Server Other
{count} votes

Accepted answer
  1. LiHongMSFT-4306 31,566 Reputation points
    2022-08-30T07:16:19.423+00:00

    Hi @Dhanekula, Shashikanth
    Try this code:

    ;WITH CTE1 AS  
    (  
     SELECT *,SUBSTRING(Bad_Data,1,PATINDEX('%-%',REPLACE(Bad_Data,' ','-'))-1)AS Result1  
     FROM Sample_Data  
    ),CTE2 AS  
    (  
    SELECT id,Bad_data,Result1  
          ,TRIM(TRANSLATE(RIGHT(Bad_Data,LEN(Bad_Data)-LEN(Result1)),'0123456789.-','            '/*12 spaces here*/))AS New_BadData  
    FROM CTE1  
    )  
    SELECT id,Bad_data,Result1  
          ,CASE WHEN LEN(New_BadData)-LEN(REPLACE(New_BadData, ' ', ''))>1   
             THEN REVERSE(SUBSTRING(REVERSE(New_BadData),1,CHARINDEX(' ',REVERSE(New_BadData))+CHARINDEX(' ',REVERSE(TRIM(LEFT(New_BadData,LEN(New_BadData)-CHARINDEX(' ',REVERSE(New_BadData))))))-1))  
     ELSE New_BadData END AS Result2  
    FROM CTE2  
    

    Best regards,
    LiHong


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
    Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


1 additional answer

Sort by: Most helpful
  1. Yitzhak Khabinsky 26,586 Reputation points
    2022-08-30T15:25:57.06+00:00

    Hi @Dhanekula, Shashikanth ,

    Please try the following solution.

    SQL

    -- DDL and sample data population, start  
    DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, bad_data VARCHAR(200));  
    INSERT INTO @tbl (bad_data) VALUES  
    ('LTY - REEL - Mark Hudson - 999.265.6233'),  
    ('IOO Craig David 123-456-7891'),  
    ('LTK - MEKL - MIKAL MEKL - 123.558.1234'),  
    ('RAW- RANDY JHON- 123-456-7891'),  
    ('MNQR- JERRY RAM 123-456-7891'),  
    ('LTK - MEKL - TERRANCE LAWRENCE - 123.456.7891'),  
    ('LTK - MEKL - Joseph Rick - 1234567891'),  
    ('MDKL JERRY RAM 123 456 7891'),  
    ('MUINK- OSCAR PISTARI- 123-456-7891'),  
    ('MUINK-Leo David'),  
    ('ALT - Stephen Smith  123-456-7891');  
    -- DDL and sample data population, end  
      
    DECLARE @separator CHAR(1) = '-';  
      
    SELECT t.*  
     , c.value('(/root/r[1]/text())[1]', 'VARCHAR(20)') AS result1  
     , IIF(cnt = 3  
     , c.query('(data(/root/r[position()=(2,3)]/text()))').value('text()[1]', 'VARCHAR(20)')  
     , c.query('(data(/root/r[position()=(3,4)]/text()))').value('text()[1]', 'VARCHAR(20)')) AS result2  
    FROM @tbl AS t  
    CROSS APPLY (SELECT REPLACE(REPLACE(TRIM('-' FROM REPLACE(LEFT(bad_data, PATINDEX('%[0-9]%', Bad_Data + '0') - 1), SPACE(1), @separator)),'--','-'),'--','-')) AS t1(tokens)  
    CROSS APPLY (SELECT TRY_CAST('<root><r>' +   
          REPLACE(tokens, @separator, '</r><r>') +   
          '</r></root>' AS XML)) AS t2(c)  
    CROSS APPLY (SELECT c.value('count(/root/r)', 'INT')) AS t3(cnt);  
    

    Output

    +----+-----------------------------------------------+---------+-------------------+  
    | ID |                   bad_data                    | result1 |      result2      |  
    +----+-----------------------------------------------+---------+-------------------+  
    |  1 | LTY - REEL - Mark Hudson - 999.265.6233       | LTY     | Mark Hudson       |  
    |  2 | IOO Craig David 123-456-7891                  | IOO     | Craig David       |  
    |  3 | LTK - MEKL - MIKAL MEKL - 123.558.1234        | LTK     | MIKAL MEKL        |  
    |  4 | RAW- RANDY JHON- 123-456-7891                 | RAW     | RANDY JHON        |  
    |  5 | MNQR- JERRY RAM 123-456-7891                  | MNQR    | JERRY RAM         |  
    |  6 | LTK - MEKL - TERRANCE LAWRENCE - 123.456.7891 | LTK     | TERRANCE LAWRENCE |  
    |  7 | LTK - MEKL - Joseph Rick - 1234567891         | LTK     | Joseph Rick       |  
    |  8 | MDKL JERRY RAM 123 456 7891                   | MDKL    | JERRY RAM         |  
    |  9 | MUINK- OSCAR PISTARI- 123-456-7891            | MUINK   | OSCAR PISTARI     |  
    | 10 | MUINK-Leo David                               | MUINK   | Leo David         |  
    | 11 | ALT - Stephen Smith  123-456-7891             | ALT     | Stephen Smith     |  
    +----+-----------------------------------------------+---------+-------------------+  
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.