Extract Strings from a Bad Data Using CTE / Pattern Matching using MS SQL - 2018

Question

Extract Strings from a Bad Data Using CTE / Pattern Matching using MS SQL - 2018

Dhanekula, Shashikanth 21

I am trying to extract the result strings from the Bad Data column (Please See Attached).

Issues Faced:

The problem is the separator is not consistent.
I cannot use "Case statements" because a new pattern might be added in the future.
Probable solution: I know using CTE could be the solution but I am not very well versed in using CTE.

Probable example for CTE code: https://sqlserverfast.com/blog/hugo/2019/04/removing-multiple-patterns-from-a-string/

Any help is appreciated.

Thank you

Sample data:

Sample code:

CREATE TABLE [Sample_Data]  
(  
 [PK] int IDENTITY(1,1) PRIMARY KEY,  
 [id] [varchar](255) NULL,  
 [Bad_data] [varchar](200) NULL,  
)  
GO  
  
INSERT INTO [Sample_Data]  
           ([id]  
           ,[Bad_data])  
     VALUES  
           ('1','LTY - REEL - Mark Hudson - 999.265.6233'),  
 ('23','IOO Craig David 123-456-7891'),  
 ('28','LTK - MEKL - MIKAL MEKL - 123.558.1234'),  
 ('36','RAW- RANDY JHON- 123-456-7891'),  
 ('2','MNQR- JERRY RAM 123-456-7891'),  
 ('I12','LTK - MEKL - TERRANCE LAWRENCE - 123.456.7891'),  
 ('L-324','LTK - MEKL - Joseph Rick - 1234567891'),  
 ('JKO','MDKL JERRY RAM 123 456 7891'),  
 ('9','MUINK- OSCAR PISTARI- 123-456-7891'),  
 ('10','MUINK-Leo David'),  
 ('11','ALT - Stephen Smith  123-456-7891')  
  
GO

LiHongMSFT-4306 31,566 Reputation points

2022-08-30T05:52:59.313+00:00

Hi @Dhanekula, Shashikanth
Could you please share the sample table structure (CREATE TABLE …) and (INSERT INTO …)? So that we’ll get a right direction and make some test.
And your SQL sever Version.

Dhanekula, Shashikanth 21

Hello Sir,
Thank you for your response. MS SQL Server version is 2018 - 14.0.17289.0 And here is the code attached.

CREATE TABLE [Sample_Data]
(
[PK] int IDENTITY(1,1) PRIMARY KEY,
[id] varchar NULL,
[Bad_data] varchar NULL,
)
GO

INSERT INTO [Sample_Data]  
           ([id]  
           ,[Bad_data])  
     VALUES  
           ('1','LTY - REEL - Mark Hudson - 999.265.6233'),  
			('23','IOO Craig David 123-456-7891'),  
			('28','LTK - MEKL - MIKAL MEKL - 123.558.1234'),  
			('36','RAW- RANDY JHON- 123-456-7891'),  
			('2','MNQR- JERRY RAM 123-456-7891'),  
			('I12','LTK - MEKL - TERRANCE LAWRENCE - 123.456.7891'),  
			('L-324','LTK - MEKL - Joseph Rick - 1234567891'),  
			('JKO','MDKL JERRY RAM 123 456 7891'),  
			('9','MUINK- OSCAR PISTARI- 123-456-7891'),  
			('10','MUINK-Leo David'),  
			('11','ALT - Stephen Smith  123-456-7891')  
  
GO

Accepted answer

1 additional answer

Your answer

LiHongMSFT-4306 31,566 Reputation points

2022-08-30T05:52:59.313+00:00

Hi @Dhanekula, Shashikanth
Could you please share the sample table structure (CREATE TABLE …) and (INSERT INTO …)? So that we’ll get a right direction and make some test.
And your SQL sever Version.
Dhanekula, Shashikanth 21 Reputation points

2022-08-30T06:24:06.153+00:00

Hello Sir,
Thank you for your response. MS SQL Server version is 2018 - 14.0.17289.0 And here is the code attached.

CREATE TABLE [Sample_Data]
(
[PK] int IDENTITY(1,1) PRIMARY KEY,
[id] varchar NULL,
[Bad_data] varchar NULL,
)
GO

INSERT INTO [Sample_Data] ([id] ,[Bad_data]) VALUES ('1','LTY - REEL - Mark Hudson - 999.265.6233'), ('23','IOO Craig David 123-456-7891'), ('28','LTK - MEKL - MIKAL MEKL - 123.558.1234'), ('36','RAW- RANDY JHON- 123-456-7891'), ('2','MNQR- JERRY RAM 123-456-7891'), ('I12','LTK - MEKL - TERRANCE LAWRENCE - 123.456.7891'), ('L-324','LTK - MEKL - Joseph Rick - 1234567891'), ('JKO','MDKL JERRY RAM 123 456 7891'), ('9','MUINK- OSCAR PISTARI- 123-456-7891'), ('10','MUINK-Leo David'), ('11','ALT - Stephen Smith 123-456-7891') GO

Answer 1

Hi @Dhanekula, Shashikanth
Try this code:

;WITH CTE1 AS  
(  
 SELECT *,SUBSTRING(Bad_Data,1,PATINDEX('%-%',REPLACE(Bad_Data,' ','-'))-1)AS Result1  
 FROM Sample_Data  
),CTE2 AS  
(  
SELECT id,Bad_data,Result1  
      ,TRIM(TRANSLATE(RIGHT(Bad_Data,LEN(Bad_Data)-LEN(Result1)),'0123456789.-','            '/*12 spaces here*/))AS New_BadData  
FROM CTE1  
)  
SELECT id,Bad_data,Result1  
      ,CASE WHEN LEN(New_BadData)-LEN(REPLACE(New_BadData, ' ', ''))>1   
         THEN REVERSE(SUBSTRING(REVERSE(New_BadData),1,CHARINDEX(' ',REVERSE(New_BadData))+CHARINDEX(' ',REVERSE(TRIM(LEFT(New_BadData,LEN(New_BadData)-CHARINDEX(' ',REVERSE(New_BadData))))))-1))  
 ELSE New_BadData END AS Result2  
FROM CTE2

Best regards,
LiHong

If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Dhanekula, Shashikanth 21 Reputation points

2022-08-30T15:13:56.763+00:00

Thank you so much this solution solves most of my problem!!

Answer 2

Hi @Dhanekula, Shashikanth ,

Please try the following solution.

SQL

-- DDL and sample data population, start  
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, bad_data VARCHAR(200));  
INSERT INTO @tbl (bad_data) VALUES  
('LTY - REEL - Mark Hudson - 999.265.6233'),  
('IOO Craig David 123-456-7891'),  
('LTK - MEKL - MIKAL MEKL - 123.558.1234'),  
('RAW- RANDY JHON- 123-456-7891'),  
('MNQR- JERRY RAM 123-456-7891'),  
('LTK - MEKL - TERRANCE LAWRENCE - 123.456.7891'),  
('LTK - MEKL - Joseph Rick - 1234567891'),  
('MDKL JERRY RAM 123 456 7891'),  
('MUINK- OSCAR PISTARI- 123-456-7891'),  
('MUINK-Leo David'),  
('ALT - Stephen Smith  123-456-7891');  
-- DDL and sample data population, end  
  
DECLARE @separator CHAR(1) = '-';  
  
SELECT t.*  
 , c.value('(/root/r[1]/text())[1]', 'VARCHAR(20)') AS result1  
 , IIF(cnt = 3  
 , c.query('(data(/root/r[position()=(2,3)]/text()))').value('text()[1]', 'VARCHAR(20)')  
 , c.query('(data(/root/r[position()=(3,4)]/text()))').value('text()[1]', 'VARCHAR(20)')) AS result2  
FROM @tbl AS t  
CROSS APPLY (SELECT REPLACE(REPLACE(TRIM('-' FROM REPLACE(LEFT(bad_data, PATINDEX('%[0-9]%', Bad_Data + '0') - 1), SPACE(1), @separator)),'--','-'),'--','-')) AS t1(tokens)  
CROSS APPLY (SELECT TRY_CAST('<root><r>' +   
      REPLACE(tokens, @separator, '</r><r>') +   
      '</r></root>' AS XML)) AS t2(c)  
CROSS APPLY (SELECT c.value('count(/root/r)', 'INT')) AS t3(cnt);

Output

+----+-----------------------------------------------+---------+-------------------+  
| ID |                   bad_data                    | result1 |      result2      |  
+----+-----------------------------------------------+---------+-------------------+  
|  1 | LTY - REEL - Mark Hudson - 999.265.6233       | LTY     | Mark Hudson       |  
|  2 | IOO Craig David 123-456-7891                  | IOO     | Craig David       |  
|  3 | LTK - MEKL - MIKAL MEKL - 123.558.1234        | LTK     | MIKAL MEKL        |  
|  4 | RAW- RANDY JHON- 123-456-7891                 | RAW     | RANDY JHON        |  
|  5 | MNQR- JERRY RAM 123-456-7891                  | MNQR    | JERRY RAM         |  
|  6 | LTK - MEKL - TERRANCE LAWRENCE - 123.456.7891 | LTK     | TERRANCE LAWRENCE |  
|  7 | LTK - MEKL - Joseph Rick - 1234567891         | LTK     | Joseph Rick       |  
|  8 | MDKL JERRY RAM 123 456 7891                   | MDKL    | JERRY RAM         |  
|  9 | MUINK- OSCAR PISTARI- 123-456-7891            | MUINK   | OSCAR PISTARI     |  
| 10 | MUINK-Leo David                               | MUINK   | Leo David         |  
| 11 | ALT - Stephen Smith  123-456-7891             | ALT     | Stephen Smith     |  
+----+-----------------------------------------------+---------+-------------------+

Share via

Extract Strings from a Bad Data Using CTE / Pattern Matching using MS SQL - 2018

1 additional answer

Your answer