How to replace embedded commas in a string in a table (REVISED)

Coreysan 1,811 Reputation points
2023-06-30T23:37:18.5466667+00:00

(Unfortunately I'm stuck with using Sql Server 2012 R2, so it doesn't support BULK INSERT "FORMAT='CSV' ".)

I'm uploading data from CSV, but right now I upload each record into one large string:

create table (wrecord varchar(4000))

Then I would use T-SQL to edit each string. Is there some slick way of editing out embedded commas inside a sub-string that has dbl-quotes?

I realize it would be a hack, but I'm hopeful for something.

SQL Server | Other
{count} votes

Accepted answer
  1. Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
    2023-07-01T21:08:13.3666667+00:00

    Three alternatives:

    1. Write a CLR procedure for the task. There are classes in the Microsoft.VisualBasic namespace to parse CSV files. As it happens, I have an article on my web site where I have demos that uses these classes, although this is a client-side program. You find it at https://www.sommarskog.se/arrays-in-sql-2008.html
    2. Parse the file in a program outside of SQL Server and pass the data though table-variable (which is exactly what that article describes), or just write a new file which is tab-delimited and import that file with BULK INSERT.
    3. Install an instance of SQL 2017 (or later) Express edition. In your main server, set up a linked server to the Express instance and remote control the BULK INSERT on the Express instance from the main server. You will need to load the file to table on the Express instance, and then copy table over the linked server.

    But you don't parse the file in T-SQL.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Yitzhak Khabinsky 26,586 Reputation points
    2023-07-02T02:14:18.3166667+00:00

    Hi @Coreysan,

    Please try the following solution.

    It is based on tokenization, and using SQL Server's XML and XQuery.

    -- DDL and sample data population, start
    DECLARE @tbl TABLE (id INT IDENTITY PRIMARY KEY, tokens varchar(4000));
    INSERT @tbl (tokens) VALUES
    ('one,two,three,four,five,"one, two", six,seven, "123 anywhere, CA", eight, nine');
    -- DDL and sample data population, end
    
    DECLARE @separator CHAR(1) = ','
    
    SELECT *
    	, c.value('(/r[1]/text())[1]', 'VARCHAR(20)') AS col1
    	, c.value('(/r[2]/text())[1]', 'VARCHAR(20)') AS col2
    	, c.value('(/r[3]/text())[1]', 'VARCHAR(20)') AS col3
    	, c.value('(/r[4]/text())[1]', 'VARCHAR(20)') AS col4
    	, c.value('(/r[5]/text())[1]', 'VARCHAR(20)') AS col5
    	, c.value('(/r[6]/text())[1]', 'VARCHAR(20)') AS col6
    	, c.value('(/r[7]/text())[1]', 'VARCHAR(20)') AS col7
    	, c.value('(/r[8]/text())[1]', 'VARCHAR(20)') AS col8
    	, c.value('(/r[9]/text())[1]', 'VARCHAR(20)') AS col9
    	, c.value('(/r[10]/text())[1]', 'VARCHAR(20)') AS col10
    	, c.value('(/r[11]/text())[1]', 'VARCHAR(20)') AS col11
    FROM @tbl AS t
        CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' + 
            REPLACE(tokens, @separator, ']]></r><r><![CDATA[') + 
            ']]></r></root>' AS XML).query('
    		for $x in /root/r/text()
    			let $pos := count(root/r[. << $x])
    			let $token1 := $x cast as xs:token?
    			let $length1 := string-length($token1)
    			let $token2 := (/root/r[$pos + 1]/text())[1] cast as xs:token?
    			let $length2 := string-length($token2)
    		return if (substring($token1,1,1)="""" and substring($token2,$length2,1)="""") 
    			then <r>{concat(substring($token1,2,100), ", ", substring($token2,1,$length2 - 1))}</r>
    			else if (substring($token1,$length1,1)="""") then ()
    			else <r>{$token1}</r>
    ')) AS t1(c);
    
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.