Bulk insert and custom delimiters - (quote Separator - þ ) - (Column - ¶ )

ElDiablo 21 Reputation points
2021-04-04T13:28:35.247+00:00

I have search around but have not found anything that has worked in bulk inserting records into a SQL table. I have tried different variation using characters, ASCII and HEX values without success. Each time generating an error. I usually perform alteration in Excel (no SQL experience_), but this file has over 5M records. This has to be possible, does anyone have a working solution or provide additional guidance ? Thank you in advance.

ERROR:

Msg 4866, Level 16, State 1, Line 110 The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly. Msg 7399, Level 16, State 1, Line 110 The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error. Msg 7330, Level 16, State 2, Line 110 Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".

Sample Data

84325-test.txt
84269-1.png

84332-2.png

-- Linefeed - 0x0A  
-- Quote  -  0xFE  
-- Column - 0X14  
  
  
  
BULK INSERT AllTags  
FROM 'C:\Data\Swap Drive\Elegance Brands\01. Document Export\REL000001-REL296747\VOL0002.DAT'  
WITH (  
	FIRSTROW = 2 ,  
	FIELDTERMINATOR = '0x14',  
    ROWTERMINATOR='\n' ,  
    BATCHSIZE=250000 ,  
    MAXERRORS= 2,  
	KEEPNULLS  
	);	  
GO  

Appreciated any assistance, thank you in advance.

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,774 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,555 questions
0 comments No comments
{count} votes

Accepted answer
  1. MelissaMa-MSFT 24,176 Reputation points
    2021-04-05T08:20:57.447+00:00

    Hi @ElDiablo ,

    Welcome to Microsoft Q&A!

    After checking, your DDL of AllTags provided is not working with your text provided.

    It could be better to provide the complete DDL as below as a good starting:

    drop table if exists AllTags  
    CREATE TABLE dbo.AllTags (  
    REFERENCEID   varchar (100) ,  
    [BEGBATES]    varchar (100) ,  
    [ENDBATES]    varchar (100) ,  
    [BEGATTACH]   varchar (100) ,  
    [ENDATTACH]   varchar (100) ,  
    [PARENTBATES] varchar (100) ,  
    [ATTACHMENT]  varchar (1000),  
    CUSTODIAN varchar(100),   
    DUPCUSTODIAN varchar(100),   
    [FROM] varchar(100),   
    [TO] varchar(100),   
    CC varchar(100),   
    BCC varchar(100),   
    SUBJECT varchar(100),   
    DATESENT varchar(100),   
    TIMESENT varchar(100),   
    DATERCVD varchar(100),   
    TIMERCVD varchar(100),   
    FILEEXT varchar(100),   
    AUTHOR varchar(100),   
    CREATEDATE varchar(100),   
    CREATETIME varchar(100),   
    DATELASTMOD varchar(100),   
    TIMELASTMOD varchar(100),   
    FILENAME varchar(100),   
    DUPFILENAME varchar(100),   
    FILELENGTH varchar(100),   
    PGCOUNT varchar(100),   
    DOCTYPE varchar(100),   
    FAMDATE varchar(100),   
    FAMTIME varchar(100),   
    TIMEZONE varchar(100),   
    PATH varchar(max),   
    DUPPATH varchar(max),   
    DEDUPHASH varchar(100),   
    NATIVEPATH varchar(100),   
    OCRPATH varchar(100),   
    TITLE varchar(100),   
    COMPANY varchar(100),   
    DATEACCESSED varchar(100),   
    TIMEACCESSED varchar(100),   
    DATEPRINTED varchar(100),   
    TIMEPRINTED varchar(100),   
    CONVDATE varchar(100),   
    CONVTIME varchar(100),   
    ATTACHLIST varchar(100),   
    FAMILYRANGE varchar(100),   
    ALLCUSTODIANS varchar(100),   
    ALLFILENAMES varchar(100),   
    ALLFILEPATHS varchar(max),   
    HASHMD5 varchar(100),   
    HASHSHA varchar(100),   
    TAGS varchar(100),   
    DOCNOTE varchar(100),   
    PRIVNOTE varchar(100),   
    REDACTRSNS varchar(100),   
    DISCOID varchar(100),   
    MESSAGEID varchar(100),   
    THREADID varchar(100),   
    ATTACHCOUNT varchar(100),   
    HIDDENTYPE varchar(100),   
    METAREDACTED varchar(100),   
    INREPLYTOID varchar(100),   
    OBJECTHASH varchar(100),   
    REVISION varchar(100),   
    HEADER varchar(100),   
    IMPORTANCE varchar(100),   
    DELIVERYRECEIPT varchar(100),   
    READRECEIPT varchar(100),   
    SENSITIVITY varchar(100),   
    LASTAUTHOR varchar(100),   
    ESUBJECT varchar(100),   
    DATEAPPTSTART varchar(100),   
    DATEAPPTEND varchar(100),   
    CALBEGDATE varchar(100),   
    CALENDDATE varchar(100),   
    CALBEGTIME varchar(100),   
    CALENDTIME varchar(100),   
    CALENDUR varchar(100),   
    RECORDTYPE varchar(100),   
    REVISIONNUMBER varchar(100),   
    Exception varchar(100),   
    ExceptionDetails varchar(100),   
    TextLimitExceeded varchar(100)  
    )   
    

    If your version is SQL Server 2017 (14.x) CTP 1.1 and later, you could have a try with below:

    BULK INSERT AllTags  
    FROM 'C:\Data\Swap Drive\REL000001-REL296747\VOL0002.DAT'  
     WITH (  
         FIRSTROW = 2 ,  
         DATAFILETYPE ='char',  
         FIELDTERMINATOR = '0x14',  
         ROWTERMINATOR = '\n',  
         BATCHSIZE=250000 ,  
         CODEPAGE=65001,  
         MAXERRORS= 2,  
    	 FIELDQUOTE=N'þ' ,  ---adding this row  
         KEEPNULLS  
         );      
      
    select * from AllTags  
    

    Below is part of the output but is a good starting. Then you could try to deal with the data in TSQL side, like using REPLACE function.
    84428-tag.png

    Best regards
    Melissa


    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


5 additional answers

Sort by: Most helpful
  1. ElDiablo 21 Reputation points
    2021-04-05T16:15:42.173+00:00

    UREKA!

    First - That you both for wisdom and tutelage. I would been pulling my hair out for a few more days.

    I have been reviewing the documentation on BCP and how to created the format file. since I am new to this stuff I have to figure out every piece to deal each database.

    I kept getting truncation error because of fields lengths. So I change all fields to (MAX) after I noticed @MelissaMan used one field type. I made the mistake of trying to keep field types as they appeared in the data and this makes it much easier. Since this is just a staging area for more field manipulation and the transposing of data, I don't think it affects anything.

    @Erland Sommarskog - I also took your last suggestion and striped out the "þ" character before importing the data. I just used the sed command in ubuntu shell (Command below), one needs to be careful of the file encoding as that changes the syntax of the command. since this is a UTF-8 file the "þ" character is represented by xC3\xBE. I will need to do this in SQL.

    sed -i '/s\xC3\xBE//g' VOL001.dat  
      
    https://unicode-table.com/en/00FE/  
    

    84505-done.png

    Again, thank you, this was EXCELLENT!