Bulk insert and custom delimiters - (quote Separator - þ ) - (Column - ¶ )

ElDiablo 21 Reputation points
2021-04-04T13:28:35.247+00:00

I have search around but have not found anything that has worked in bulk inserting records into a SQL table. I have tried different variation using characters, ASCII and HEX values without success. Each time generating an error. I usually perform alteration in Excel (no SQL experience_), but this file has over 5M records. This has to be possible, does anyone have a working solution or provide additional guidance ? Thank you in advance.

ERROR:

Msg 4866, Level 16, State 1, Line 110 The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly. Msg 7399, Level 16, State 1, Line 110 The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error. Msg 7330, Level 16, State 2, Line 110 Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".

Sample Data

84325-test.txt
84269-1.png

84332-2.png

-- Linefeed - 0x0A  
-- Quote  -  0xFE  
-- Column - 0X14  
  
  
  
BULK INSERT AllTags  
FROM 'C:\Data\Swap Drive\Elegance Brands\01. Document Export\REL000001-REL296747\VOL0002.DAT'  
WITH (  
	FIRSTROW = 2 ,  
	FIELDTERMINATOR = '0x14',  
    ROWTERMINATOR='\n' ,  
    BATCHSIZE=250000 ,  
    MAXERRORS= 2,  
	KEEPNULLS  
	);	  
GO  

Appreciated any assistance, thank you in advance.

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,714 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,552 questions
0 comments No comments
{count} votes

Accepted answer
  1. MelissaMa-MSFT 24,176 Reputation points
    2021-04-05T08:20:57.447+00:00

    Hi @ElDiablo ,

    Welcome to Microsoft Q&A!

    After checking, your DDL of AllTags provided is not working with your text provided.

    It could be better to provide the complete DDL as below as a good starting:

    drop table if exists AllTags  
    CREATE TABLE dbo.AllTags (  
    REFERENCEID   varchar (100) ,  
    [BEGBATES]    varchar (100) ,  
    [ENDBATES]    varchar (100) ,  
    [BEGATTACH]   varchar (100) ,  
    [ENDATTACH]   varchar (100) ,  
    [PARENTBATES] varchar (100) ,  
    [ATTACHMENT]  varchar (1000),  
    CUSTODIAN varchar(100),   
    DUPCUSTODIAN varchar(100),   
    [FROM] varchar(100),   
    [TO] varchar(100),   
    CC varchar(100),   
    BCC varchar(100),   
    SUBJECT varchar(100),   
    DATESENT varchar(100),   
    TIMESENT varchar(100),   
    DATERCVD varchar(100),   
    TIMERCVD varchar(100),   
    FILEEXT varchar(100),   
    AUTHOR varchar(100),   
    CREATEDATE varchar(100),   
    CREATETIME varchar(100),   
    DATELASTMOD varchar(100),   
    TIMELASTMOD varchar(100),   
    FILENAME varchar(100),   
    DUPFILENAME varchar(100),   
    FILELENGTH varchar(100),   
    PGCOUNT varchar(100),   
    DOCTYPE varchar(100),   
    FAMDATE varchar(100),   
    FAMTIME varchar(100),   
    TIMEZONE varchar(100),   
    PATH varchar(max),   
    DUPPATH varchar(max),   
    DEDUPHASH varchar(100),   
    NATIVEPATH varchar(100),   
    OCRPATH varchar(100),   
    TITLE varchar(100),   
    COMPANY varchar(100),   
    DATEACCESSED varchar(100),   
    TIMEACCESSED varchar(100),   
    DATEPRINTED varchar(100),   
    TIMEPRINTED varchar(100),   
    CONVDATE varchar(100),   
    CONVTIME varchar(100),   
    ATTACHLIST varchar(100),   
    FAMILYRANGE varchar(100),   
    ALLCUSTODIANS varchar(100),   
    ALLFILENAMES varchar(100),   
    ALLFILEPATHS varchar(max),   
    HASHMD5 varchar(100),   
    HASHSHA varchar(100),   
    TAGS varchar(100),   
    DOCNOTE varchar(100),   
    PRIVNOTE varchar(100),   
    REDACTRSNS varchar(100),   
    DISCOID varchar(100),   
    MESSAGEID varchar(100),   
    THREADID varchar(100),   
    ATTACHCOUNT varchar(100),   
    HIDDENTYPE varchar(100),   
    METAREDACTED varchar(100),   
    INREPLYTOID varchar(100),   
    OBJECTHASH varchar(100),   
    REVISION varchar(100),   
    HEADER varchar(100),   
    IMPORTANCE varchar(100),   
    DELIVERYRECEIPT varchar(100),   
    READRECEIPT varchar(100),   
    SENSITIVITY varchar(100),   
    LASTAUTHOR varchar(100),   
    ESUBJECT varchar(100),   
    DATEAPPTSTART varchar(100),   
    DATEAPPTEND varchar(100),   
    CALBEGDATE varchar(100),   
    CALENDDATE varchar(100),   
    CALBEGTIME varchar(100),   
    CALENDTIME varchar(100),   
    CALENDUR varchar(100),   
    RECORDTYPE varchar(100),   
    REVISIONNUMBER varchar(100),   
    Exception varchar(100),   
    ExceptionDetails varchar(100),   
    TextLimitExceeded varchar(100)  
    )   
    

    If your version is SQL Server 2017 (14.x) CTP 1.1 and later, you could have a try with below:

    BULK INSERT AllTags  
    FROM 'C:\Data\Swap Drive\REL000001-REL296747\VOL0002.DAT'  
     WITH (  
         FIRSTROW = 2 ,  
         DATAFILETYPE ='char',  
         FIELDTERMINATOR = '0x14',  
         ROWTERMINATOR = '\n',  
         BATCHSIZE=250000 ,  
         CODEPAGE=65001,  
         MAXERRORS= 2,  
    	 FIELDQUOTE=N'þ' ,  ---adding this row  
         KEEPNULLS  
         );      
      
    select * from AllTags  
    

    Below is part of the output but is a good starting. Then you could try to deal with the data in TSQL side, like using REPLACE function.
    84428-tag.png

    Best regards
    Melissa


    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


5 additional answers

Sort by: Most helpful
  1. Erland Sommarskog 101K Reputation points MVP
    2021-04-04T15:46:53.633+00:00

    The file you attached is encoded as UTF-8, so you would need to specify CODEPAGE=65001.

    But that is not the sole solution. The file format is kind of difficult to read. Do you have a specification of the fields? And what is the definition of the table AllTags?

    I'm sure that I understand the delimiter table. It seems that 0x14 is indeed the field terminator, but the glyph shown is the pilcrow which has a different character code. (0x14 is an invisible control character). The Reg-trade mark sign is said the be the line terminator, but the file has CR-LF like a regular Windows file, and you have specified \n as the field terminator.

    As for the quote character, we may be table to deal with that with BCP. For the other two - no.


  2. ElDiablo 21 Reputation points
    2021-04-04T16:33:16.167+00:00

    Getting closer,

    BULK INSERT AllTags
    FROM 'C:\Data\Swap Drive\REL000001-REL296747\VOL0002.DAT'
    WITH (
        FIRSTROW = 2 ,
        DATAFILETYPE ='char',
        FIELDTERMINATOR = '0x14',
        ROWTERMINATOR = '\n',
        BATCHSIZE=250000 ,
        CODEPAGE=65001,
        MAXERRORS= 2,
        KEEPNULLS
        );  
    GO
    

    Resulted in

    Msg 4863, Level 16, State 1, Line 102
    Bulk load data conversion error (truncation) for row 2, column 2 (ENDBATES).
    Msg 4863, Level 16, State 1, Line 102
    Bulk load data conversion error (truncation) for row 3, column 2 (ENDBATES).
    Msg 4863, Level 16, State 1, Line 102
    Bulk load data conversion error (truncation) for row 4, column 2 (ENDBATES).
    Msg 4865, Level 16, State 1, Line 102
    Cannot bulk load because the maximum number of errors (2) was exceeded.
    Msg 7399, Level 16, State 1, Line 102
    The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
    Msg 7330, Level 16, State 2, Line 102
    Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".

    0 comments No comments

  3. Erland Sommarskog 101K Reputation points MVP
    2021-04-04T18:28:47.87+00:00

    I examined your file, this way. I first created a format file:

    14.0
    1
    1 SQLCHAR 0 0 "\r\n" 2 StyleCode Finnish_Swedish_100_CS_AS_SC_UTF8
    

    Yes, this format file says that it is all one line.

    Then I ran this:

    drop table if exists AllTags
    go
    CREATE TABLE dbo.AllTags (
       rowno int IDENTITY(1,1) NOT NULL PRIMARY KEY,
       longline nvarchar(MAX) NOT NULL
    )
    go
         BULK INSERT AllTags
         FROM 'C:\temp\slask.bcp'
         WITH (
             FORMATFILE = 'C:\temp\slask.fmt',
             CODEPAGE=65001
             );    
    go
    SELECT rowno, COUNT(*)
    FROM   AllTags a
    CROSS  APPLY string_split(a.longline, nchar(0x14))
    GROUP BY rowno
    

    The output said that were 84 fields on all rows with 0x14 as the delimiter. That is good news so far that this means that we may be able to process this file with BULK INSERT.

    (The IDENTITY column is there to allow the grouping, and this IDENTITY column is the reason I have the format file.)

    I don't know exactly what your ambition is, but your table has far too columns to accommodate all fields. Then again, you call the last column ATTACHMENT, so maybe you only care about the first columns and don't care about the rest.

    In any case, the reason you get the truncation messages is simply that your column widths are too narrow. The longest line in your sample file is 1354 characters. I was able to load the file without errors with your BULK INSERT statement, when I set the table this way:

    CREATE TABLE dbo.AllTags (
    [BEGBATES]    varchar (20) ,
    [ENDBATES]    varchar (20) ,
    [BEGATTACH]   varchar (20) ,
    [ENDATTACH]   varchar (20) ,
    [PARENTBATES] varchar (20) ,
    [ATTACHMENT]  varchar (2000)
    )
    

    When it comes to the thorns, I suspect that we are out of luck. I have found that BULK INSERT is not able to deal with non-ASCII delimiters in UTF-8 files. But you can strip them in a separate step.

    0 comments No comments

  4. ElDiablo 21 Reputation points
    2021-04-04T19:45:04.15+00:00

    These files are an export of an application with an SQL backend with various database. Each database will have any number of fields (which is why use the excel) to quickly create the table in SQL. I have to create various extractions to be able to import this into another database and i can only export this data from the front end. This one has almost 100 columns, others have more.

    I usually do this in excel,

    • Replace 0xFE-0x14- 0xFE to /t (tab)
    • remove excessive 0xFE (beginning and end of line characters)
    • load to excel

    *sometime depending on the data, columns get scattered, so it doesn't always work.

    However, excel starts to choke after a million records or so. this one has 5M, I thought the next logical step here was to start seeing how i can do this in SQL. into the deep end of the pool without a life jacket and not knowing how to swim.

    I will modify the able now to see what I get.

    Much appreciated the assistance, than you!

    0 comments No comments