Load CSV file with slightly different schemas during each run

2024-03-18T17:01:42.6+00:00

I have a CSV file with Column headers as the first line but often sometimes the column position will vary but will be available in a different place. Please help me to advise how to load this dynamic schema change in CSV file to SQL server tables. Currently I am using .net to load this data from CSV to SQL tables, but open to any other language or tool which can handle this change without modifying much in the code for future.

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,607 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Yitzhak Khabinsky 24,911 Reputation points
    2024-03-18T18:05:42.5166667+00:00

    Hi @Shreedevi Sendarapatti Ramamoorthy,

    *.csv/txt flat files always have host of problems where column delimiters, column separators, invisible characters like null terminators \0, special characters based on encoding, and line breaks are in the middle of the actual data. Or data elements are missing completely, i.e. column positions will vary.

    The most reliable format for data feeds is XML enforced by an XSD. An XSD plays a role of a data contract between sender and receiver. It will guarantee proper data format/shape, data types, cardinality, and enforce data quality.

    For example, your XML file has the following structure:

    <?xml version="1.0" encoding="utf-8"?>
    <books>
      <book id="1">
        <title>Book A</title>
        <author>Jack Welch</author>
      </book>
      <book id="2">
        <title>Book B</title>
        <author>Henry Ford</author>
      </book>
    </books>
    
    
    

    Sequential position or missing XML elements title, author, etc. it is all irrelevant. Pass a fully qualified XML file path @Filepath to a stored procedure.

    /*
    Step #1: Load XML file into single row db table
    */
    DECLARE @tbl TABLE (XMLColumn XML);
    DECLARE @Filepath VARCHAR(100) = 'e:\Temp\books.xml';
    DECLARE @sql NVARCHAR(MAX) = 
    N'SELECT BulkColumn 
       FROM Openrowset(Bulk ' + QUOTENAME(@Filepath, NCHAR(39)) + ', Single_Blob) x;';
    PRINT @sql;
    INSERT INTO @tbl (XMLColumn)
    EXEC sys.sp_executesql @sql;
    -- just to see
    SELECT * FROM @tbl;
    /*
    Step #2: Shred XML without dynamic SQL
    */
    --INSERT INTO <TARGET TABLE>	-- uncomment when you are ready
    SELECT c.value('@id', 'INT') AS id
    	, c.value('(title/text())[1]', 'NVARCHAR(50)') As title
    	, c.value('(author/text())[1]', 'NVARCHAR(50)') As author
    FROM @tbl
    CROSS APPLY XMLColumn.nodes('/books/book') AS t(c);
    
    0 comments No comments