Delimited Text(CSV) dataset escape/quote character

Santhi Dhanuskodi 325 Reputation points
2024-07-24T10:52:13.6533333+00:00

Hi,

I am facing an issue with delimited text(CSV) dataset, whne my data contains comma.

So I added quote character in configuration. But this double quotes character is applied for most of the column values, even though the column value doesnt contain comma.

for eg my data looks like

"testt","test","test","O41","test","test","test","test","D",216,"","2016-06",2016-06-01 00:00:00.0000000,23335.4000

the above data doesnt have comma in values, but I can see double quotes for most, but number, and date values are not double quoted. And some numbers, some dates are quoted. I dont know what data will be enclosed with double quotes, and what not. how it picks up the data?

User's image

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,667 questions
0 comments No comments
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 34,561 Reputation points Microsoft Employee Moderator
    2024-07-24T13:19:10.5166667+00:00

    Hi Santhi Dhanuskodi ,

    Thankyou for posting your query on Microsoft Q&A platform.

    The quoteChar property is used in delimited text files in ADF is used to specify the character that is used to enclose column values. If the quoteChar is set to double quotes ", it means string column values in the CSV file would be enclosed within double quotes , it's applicable for both the types of columns having delimited character and not having delimited character.

    Quote char typically are used to handle cases where the data itself contains delimiter characters (such as commas in CSV files) or newline characters. But it doesn't mean it will skip the columns not having delimited characters.

    For example , suppose your source is SQL table and it contains data as:

    id(int), name(varchar),CreatedDate (date)
    1, 'ABC', 
    

    then having quote char as " in delimited text sink dataset would result the data as:

    id,name,CreatedDate
    1,"ABC",2016-06-01 00:00:00.0000000
    

    Hope it helps. If you have any further query, kindly let us know. Kindly accept the answer by clicking on Accept answer button. Thankyou

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Amira Bedhiafi 34,651 Reputation points Volunteer Moderator
    2024-07-24T17:32:53.8966667+00:00

    Given your example :

    "testt","test","test","O41","test","test","test","test","D",216,"","2016-06",2016-06-01 00:00:00.0000000,23335.4000
    

    Here’s what might be happening:

    • The default behavior of some CSV generators is to always quote text fields, as they are more likely to contain special characters.
    • These fields may or may not be quoted based on their content and the configuration of the CSV generator.
    • Empty fields are often quoted to clearly denote the empty value.

    So what to do ?

    • If you want all fields to be consistently quoted, you can configure this in ADF.
    • If you prefer conditional quoting (only quote fields with special characters), ensure your configuration supports this and that you understand when it will apply quotes.
    • If double quotes are causing issues, you can specify a different quote character.

    This how you may define the JSON config :

    {
        "name": "DelimitedTextDataset",
        "properties": {
            "linkedServiceName": {
                "referenceName": "AzureBlobStorageLinkedService",
                "type": "LinkedServiceReference"
            },
            "type": "DelimitedText",
            "typeProperties": {
                "location": {
                    "type": "AzureBlobStorageLocation",
                    "fileName": "yourfile.csv",
                    "folderPath": "yourfolderpath"
                },
                "columnDelimiter": ",",
                "quoteChar": "\"",
                "escapeChar": "\\",
                "firstRowAsHeader": true,
                "quoteAllText": true
            }
        }
    }
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.