Parsing suffix from a string

Debilon 431 Reputation points
2022-04-01T20:07:42.667+00:00

I need to parse the suffix JR from a string.
few conditions:

  1. if there is a word after the JR - that name needs to be inserted as the newname
  2. In extract string column - only the letters JR
  3. NewName Empty unless a word appears after JR DECLARE @Bluemchen AS TABLE (string nvarchar(max)) INSERT @Bluemchen (string)
    VALUES ('CHAMBERS DENNIS L JR BRIAY'),
    ('GRAHAM JAMES D JR'),
    ('CHAMBERS DENNIS L JR'); SELECT string, RIGHT(string, 3) AS ExtractString,
    substring(string,1,(CHARINDEX(' ',string + ' ')-1)) as firstName,
    substring(string,2,(CHARINDEX(' ',string + ' ')-2)) as SecondName,
    substring(string,3,(CHARINDEX(' ',string + ' ')-3)) as ThirdName,
    substring(string,4,(CHARINDEX(' ',string + ' ')-4)) as NewName
    from @Bluemchen
    where string is not null and string like '% JR%'

189351-image.png

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,552 questions
{count} votes

Accepted answer
  1. Erland Sommarskog 100.9K Reputation points MVP
    2022-04-01T21:45:51.003+00:00

    Here is how you can get rid of the JR. This is almost impossible to do in T-SQL directly, but you can run a Python script for the task:

    CREATE TABLE #tbl (string nvarchar(max))
    
    INSERT #tbl (string)
    VALUES ('CHAMBERS DENNIS L JR BRIAY'),
    ('GRAHAM JAMES D JR'),
    ('CHAMBERS DENNIS L JR');
    
    DECLARE @python_script nvarchar(MAX) = 'import pandas, re
    Data["string"] = pandas.Series([ 
          re.sub(r"\s*\bJR\b\s*", r" ", str, flags=re.IGNORECASE)
          for str in Data["string"]])'
    
    EXEC sp_execute_external_script @language = N'Python', -- Set language.
         @input_data_1 = N'SELECT string FROM #tbl',  -- Data sent to Python.
         @input_data_1_name = N'Data',           -- Name of Python variable for input.
         @output_data_1_name = N'Data',          -- Name of Python variable for output.
         @script = @python_script
    
    go
    DROP TABLE #tbl
    

    For this to work, some pre-conditions needs to be fulfilled. I have a blog post that discusses this: https://sqlservergeeks.com/a-tip-about-using-python-for-regular-expressions-in-t-sql-by-erland-sommarskog/ and which also has more examples on using regular expression in Python from T-SQL.

    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Naomi 7,361 Reputation points
    2022-04-01T20:36:12.693+00:00

    The First, Second and Third names don't make sense - is it intentional?

    DECLARE @tbl AS TABLE (string nvarchar(max))
    
    INSERT @tbl (string)
    VALUES ('CHAMBERS DENNIS L JR BRIAY'),
    ('GRAHAM JAMES D JR'),
    ('CHAMBERS DENNIS L JR'), ('JULIANA JRTEST');
    
    SELECT string, CASE WHEN string LIKE '% JR' OR string LIKE '% JR %' THEN 'JR' END AS ExtractString, -- the name can be JRUNE (so not JR)
    substring(string,1,(CHARINDEX(' ',string + ' ')-1)) as firstName,
    substring(string,2,(CHARINDEX(' ',string + ' ')-2)) as SecondName,
    substring(string,3,(CHARINDEX(' ',string + ' ')-3)) as ThirdName,
    CASE WHEN X.JRPos > 0 THEN SUBSTRING(string, X.JRpos + 4, LEN(string)) end AS NewName
    from @tbl
    CROSS APPLY (SELECT PATINDEX('%[ ]JR[ ]%',string) AS JRPos) X
    where string is not null and string like '% JR%'