How to fix error 'Could not find a delimiter after string delimiter' with External table on Synapse?

Vivek Komarla Bhaskar 911 Reputation points
2023-05-09T22:02:53.04+00:00

Using Synapse, I have an external table that points to an ADLS Gen2 folder with multiple pipe-delimited files. As my pipe-delimited file, my external table has four columns. The external table file format is as follows:

Screenshot 2023-05-09 at 10.46.13 pm

CREATE EXTERNAL TABLE [Test].[tblStaging_Page]
(
SessionID [INT] NOT NULL,
Date [Date] NOT NULL,
Article VARCHAR(2000) NULL,
Content VARCHAR(2000) NULL
)
WITH (
 LOCATION = '/Test/Inbound/Digital_Test/Page/Landing',
    DATA_SOURCE = ADLSGen2TestDataSource,
    FILE_FORMAT = DelimitedText_PipeTerminator_QuoteDelimiter_FirstRowAsHeader
);

My issue is with the error -> 'Could not find a delimiter after string delimiter', It occurs whenever I have the following data in a pipe-delimited file. Please refer to the last column of the third line below, which creates this issue.

SessionID|Date|Article|Content
12312|2023-04-01|Youtube growth|What is happening?
13354|2023-03-15|"Home | Facebook"|"Content is about 'VVIP"" people"
45565|2023-04-11|"Home | Instagram"|Nothing new, change

How can I overcome this error? What change do I need to make to overcome this?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,860 questions
{count} votes

2 answers

Sort by: Most helpful
  1. VasimTamboli 4,910 Reputation points
    2023-05-12T17:06:48.4833333+00:00
    The error message you're encountering, "Could not find a delimiter after string delimiter," typically occurs when the delimiter used in your file conflicts with the delimiter used within a string value. In your case, the pipe delimiter (|) used in your file conflicts with the pipe delimiter used within the string values.
    
    To overcome this error, you have a couple of options:
    
    Change the Delimiter: If possible, consider using a different delimiter for your file that doesn't conflict with any characters used within the string values. For example, you could use a different delimiter like a comma (,), tab (\t), or any other character that is not present within the string values.
    
    Use Escape Characters: Another approach is to use escape characters to indicate that a specific character is part of the data and should not be interpreted as a delimiter. For example, you can use double quotes (") as the string delimiter and escape any internal double quotes with an additional double quote. Here's an updated example based on your data:
    
    sql
    Copy code
    CREATE EXTERNAL TABLE [Test].[tblStaging_Page]
    (
        SessionID [INT] NOT NULL,
        Date [Date] NOT NULL,
        Article VARCHAR(2000) NULL,
        Content VARCHAR(2000) NULL
    )
    WITH
    (
        LOCATION = '/Test/Inbound/Digital_Test/Page/Landing',
        DATA_SOURCE = ADLSGen2TestDataSource,
        FILE_FORMAT = DelimitedText_PipeTerminator_QuoteDelimiter_FirstRowAsHeader,
        REJECT_TYPE = VALUE,
        REJECT_VALUE = 0,
        ESCAPE_CHARACTER = '\\',
        ROW_DELIMITER = '0x0A',
        FIELD_QUOTE = '"',
        FIELD_ESCAPE = '"'
    );
    In this example, I've set the FIELD_QUOTE to double quotes (") and specified FIELD_ESCAPE as double quotes as well. The FIELD_ESCAPE ensures that any double quotes within the string values are escaped.
    
    By using escape characters, the third line of your example data will be correctly interpreted, and the error should no longer occur.
    
    Please note that the specific syntax for configuring escape characters and other options may depend on the SQL dialect or tool you are using with Azure Synapse. Adjust the syntax as needed for your specific environment.
    
    

  2. HimanshuSinha-msft 19,471 Reputation points Microsoft Employee
    2023-05-18T20:09:48.4033333+00:00

    Hello @Vivek Komarla Bhaskar ,

    The reason you are getting the error is because the

    The below line
    User's image

    Has got just too many " and its breaking the flow .

    This is what tried which worked for me

    Used the below Spark code to cleanup the data . You can run this from the Synapse studio . The below code cleans the file and creates "clean" file to be consumed by the external table .

    %%pyspark

    from pyspark.sql.functions import *

    df = spark.read.load('abfss://hsinha@yourstorageaccount .dfs.core.windows.net/data/flight-data/csv/1.csv', format='csv'

    )

    df=df.withColumn('_c0', regexp_replace('_c0', '"', ''))

    df.write.format("csv").save("abfss://hsinha@yourstoragecct.dfs.core.windows.net/data/flight-data/csv/curated")

    display(df.limit(10))

    User's image

    External Table

    CREATE EXTERNAL TABLE dbo.test1 (
    	[C1] bigint,
    	[C2] date,
    	[C3] nvarchar(4000),
    	[C4] nvarchar(4000),
    	[C5] nvarchar(4000)
    	)
    	WITH (
    	LOCATION = 'data/flight-data/csv/curated/part-00000-3bfab150-c047-4d94-966e-d310c99d2a43-c000.csv',
    	DATA_SOURCE = [hsinha_analyticssynapsews_dfs_core_windows_net],
    	FILE_FORMAT = [SynapseDelimitedTextFormat]
    	)
    GO
    
    
    SELECT TOP 100 * FROM dbo.test1
    

    User's image

    Thanks

    Himanshu

    Please accept as "Yes" if the answer provided is useful , so that you can help others in the community looking for remediation for similar issues. 


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.