Split String Dynamically

Debilon 431 Reputation points
2022-04-02T20:11:37.64+00:00
  1. I need to split the string dynamically, each word in a different column
  2. If the string includes the Suffix JR or SR i need to move them into a column name Suffix
  3. Any word after the Suffix needs to move to a column name NewWord
  4. Length of the string and location of the Suffix in the string is unknown.
  5. Length of words in the string can be 1

Any Ideas will be welcomed.

CREATE TABLE #tbl (string nvarchar(max))
INSERT #tbl (string)
VALUES ('String'),
('String one '),
('String one two'),
('String one two three'),
('String one two three four'),
('String one two JR NewWord')

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,552 questions
{count} votes

Accepted answer
  1. Yitzhak Khabinsky 24,941 Reputation points
    2022-04-03T02:00:07.36+00:00

    Hi @Debilon ,

    Because a minimal reproducible example was not provided, I am shooting from the hip.

    The solution is using tokenization via XML/XQuery.

    SQL

    USE tempdb;  
    GO  
      
    -- DDL and sample data population, start  
    DROP TABLE IF EXISTS dbo.tbl;  
      
    CREATE TABLE dbo.tbl (ID INT IDENTITY PRIMARY KEY, tokens NVARCHAR(MAX));  
    INSERT INTO dbo.tbl (tokens) VALUES  
    (N'String'),  
    (N'String one'),  
    (N'String one two'),  
    (N'String one two three'),  
    (N'String one two three four'),  
    (N'String one two JR NewWord');  
    -- DDL and sample data population, end  
      
    DECLARE @xml XML  
    	, @separator CHAR(1) = SPACE(1)  
    	, @CrLf CHAR(2) = CHAR(13) + CHAR(10)  
    	, @SQL AS NVARCHAR(MAX);  
      
    -- count total number of columns  
    DECLARE @cnt INT, @i INT;  
      
    SET @cnt = (SELECT MAX(cnt)  
    FROM dbo.tbl AS t  
    CROSS APPLY (SELECT TRY_CAST('<root><r>' +   
    		REPLACE(tokens, @separator, '</r><r>') +   
    		'</r></root>' AS XML)) AS t1(c)  
    CROSS APPLY (SELECT c.value('count(/root/r)', 'INT')) AS t2(cnt));  
      
    -- compose T-SQL statement  
    SET @SQL = 'SELECT t.*, cnt, t3.suffix' + @CrLf;  
      
    -- loop through   
    SET @i = 1;  
    WHILE @i <= @cnt BEGIN  
       SET @SQL += ', c.value(''(/root/r[' + CAST(@i AS VARCHAR(5)) + ']/text())[1]'', ''NVARCHAR(20)'') AS col' + CAST(@i AS VARCHAR(5)) + @CrLf  
      
       SET @i += 1;  
    END  
      
    SET @SQL += ', c.value(''(/root/r[sql:column("t3.suffix")]/text())[1]'', ''NVARCHAR(20)'') AS suffix  
    	, IIF(t3.suffix > 0, c.value(''(/root/r[sql:column("t3.suffix") + 1]/text())[1]'', ''NVARCHAR(20)''), NULL) AS NewWord  
    FROM dbo.tbl AS t  
    CROSS APPLY (SELECT TRY_CAST(''<root><r>'' +   
    		REPLACE(tokens, SPACE(1), ''</r><r>'') +   
    		''</r></root>'' AS XML)) AS t1(c)  
    CROSS APPLY (SELECT c.value(''count(/root/r)'', ''INT'')) AS t2(cnt)  
    CROSS APPLY (SELECT c.query(''  
    	for $x in /root/r[text()=("JR","SR")]  
    	let $pos := count(/root/r[. << $x[1]]) + 1  
    	return $pos  
    '').value(''.'',''INT'')) AS t3(suffix);';  
      
    PRINT @SQL;  
    EXEC sys.sp_executesql @SQL;  
    

    Output

    +----+---------------------------+-----+--------+--------+------+------+-------+---------+--------+---------+  
    | ID |          tokens           | cnt | suffix |  col1  | col2 | col3 | col4  |  col5   | suffix | NewWord |  
    +----+---------------------------+-----+--------+--------+------+------+-------+---------+--------+---------+  
    |  1 | String                    |   1 |      0 | String | NULL | NULL | NULL  | NULL    | NULL   | NULL    |  
    |  2 | String one                |   2 |      0 | String | one  | NULL | NULL  | NULL    | NULL   | NULL    |  
    |  3 | String one two            |   3 |      0 | String | one  | two  | NULL  | NULL    | NULL   | NULL    |  
    |  4 | String one two three      |   4 |      0 | String | one  | two  | three | NULL    | NULL   | NULL    |  
    |  5 | String one two three four |   5 |      0 | String | one  | two  | three | four    | NULL   | NULL    |  
    |  6 | String one two JR NewWord |   5 |      4 | String | one  | two  | JR    | NewWord | JR     | NewWord |  
    +----+---------------------------+-----+--------+--------+------+------+-------+---------+--------+---------+  
    

1 additional answer

Sort by: Most helpful
  1. Naomi 7,361 Reputation points
    2022-04-03T19:51:47.853+00:00

    This is an old blog of mine which may demonstrate one of the techniques I use for splitting names (but still assumes some predefined formats)

    https://blogs.lessthandot.com/index.php/datamgmt/datadesign/parsing-fullname-field-to-individual/

    0 comments No comments