The error message you're encountering, "Could not find a delimiter after string delimiter," typically occurs when the delimiter used in your file conflicts with the delimiter used within a string value. In your case, the pipe delimiter (|) used in your file conflicts with the pipe delimiter used within the string values.
To overcome this error, you have a couple of options:
Change the Delimiter: If possible, consider using a different delimiter for your file that doesn't conflict with any characters used within the string values. For example, you could use a different delimiter like a comma (,), tab (\t), or any other character that is not present within the string values.
Use Escape Characters: Another approach is to use escape characters to indicate that a specific character is part of the data and should not be interpreted as a delimiter. For example, you can use double quotes (") as the string delimiter and escape any internal double quotes with an additional double quote. Here's an updated example based on your data:
sql
Copy code
CREATE EXTERNAL TABLE [Test].[tblStaging_Page]
(
SessionID [INT] NOT NULL,
Date [Date] NOT NULL,
Article VARCHAR(2000) NULL,
Content VARCHAR(2000) NULL
)
WITH
(
LOCATION = '/Test/Inbound/Digital_Test/Page/Landing',
DATA_SOURCE = ADLSGen2TestDataSource,
FILE_FORMAT = DelimitedText_PipeTerminator_QuoteDelimiter_FirstRowAsHeader,
REJECT_TYPE = VALUE,
REJECT_VALUE = 0,
ESCAPE_CHARACTER = '\\',
ROW_DELIMITER = '0x0A',
FIELD_QUOTE = '"',
FIELD_ESCAPE = '"'
);
In this example, I've set the FIELD_QUOTE to double quotes (") and specified FIELD_ESCAPE as double quotes as well. The FIELD_ESCAPE ensures that any double quotes within the string values are escaped.
By using escape characters, the third line of your example data will be correctly interpreted, and the error should no longer occur.
Please note that the specific syntax for configuring escape characters and other options may depend on the SQL dialect or tool you are using with Azure Synapse. Adjust the syntax as needed for your specific environment.
How to fix error 'Could not find a delimiter after string delimiter' with External table on Synapse?
Using Synapse, I have an external table that points to an ADLS Gen2 folder with multiple pipe-delimited files. As my pipe-delimited file, my external table has four columns. The external table file format is as follows:
CREATE EXTERNAL TABLE [Test].[tblStaging_Page]
(
SessionID [INT] NOT NULL,
Date [Date] NOT NULL,
Article VARCHAR(2000) NULL,
Content VARCHAR(2000) NULL
)
WITH (
LOCATION = '/Test/Inbound/Digital_Test/Page/Landing',
DATA_SOURCE = ADLSGen2TestDataSource,
FILE_FORMAT = DelimitedText_PipeTerminator_QuoteDelimiter_FirstRowAsHeader
);
My issue is with the error -> 'Could not find a delimiter after string delimiter', It occurs whenever I have the following data in a pipe-delimited file. Please refer to the last column of the third line below, which creates this issue.
SessionID|Date|Article|Content
12312|2023-04-01|Youtube growth|What is happening?
13354|2023-03-15|"Home | Facebook"|"Content is about 'VVIP"" people"
45565|2023-04-11|"Home | Instagram"|Nothing new, change
How can I overcome this error? What change do I need to make to overcome this?
2 answers
Sort by: Most helpful
-
VasimTamboli 4,910 Reputation points
2023-05-12T17:06:48.4833333+00:00 -
HimanshuSinha-msft 19,471 Reputation points Microsoft Employee
2023-05-18T20:09:48.4033333+00:00 Hello @Vivek Komarla Bhaskar ,
The reason you are getting the error is because the
The below line
Has got just too many " and its breaking the flow .
This is what tried which worked for me
Used the below Spark code to cleanup the data . You can run this from the Synapse studio . The below code cleans the file and creates "clean" file to be consumed by the external table .
%%pyspark
from pyspark.sql.functions import *
df = spark.read.load('abfss://hsinha@yourstorageaccount .dfs.core.windows.net/data/flight-data/csv/1.csv', format='csv'
)
df=df.withColumn('_c0', regexp_replace('_c0', '"', ''))
df.write.format("csv").save("abfss://hsinha@yourstoragecct.dfs.core.windows.net/data/flight-data/csv/curated")
display(df.limit(10))
External Table
CREATE EXTERNAL TABLE dbo.test1 ( [C1] bigint, [C2] date, [C3] nvarchar(4000), [C4] nvarchar(4000), [C5] nvarchar(4000) ) WITH ( LOCATION = 'data/flight-data/csv/curated/part-00000-3bfab150-c047-4d94-966e-d310c99d2a43-c000.csv', DATA_SOURCE = [hsinha_analyticssynapsews_dfs_core_windows_net], FILE_FORMAT = [SynapseDelimitedTextFormat] ) GO SELECT TOP 100 * FROM dbo.test1
Thanks
Himanshu
Please accept as "Yes" if the answer provided is useful , so that you can help others in the community looking for remediation for similar issues.