CSV to Parquet table - error due to pipe elimter present in actual data

Raffy Comendador 6 Reputation points
2021-11-26T08:20:31.127+00:00

Good day Guru's,

requesting your advise please. I need to ingest the data from pipe delimited file to a parquet table.

sample data is

012312|Manila Phils|+631239098459|RaffyC|MyEmailAdd|@réalisations .com
012432|Cebu Phils|+636549098459|Juan|Your|_|EmailAdd@réalisations .com

NOTE:

  1. Setting used to read the file (see image) and there is no schema predefined since up to filename is parameterized.

152787-image.png

  1. Setting used to write to Parquet table (see image) and there is no schema predefined since up to filename is parameterized.

152861-image.png

  1. Actual columns are: PID|Country|Phone|Name|EmailAddress . That's only 5 columns, but there is a PIPE delimiter in the actual email addresses, perhaps due to a mistake during data entry or its part of the data itself since there are hundreds like these in other columns such as address or names in other data sources.

The error message is

": "Failure happened on 'Sink' side. ErrorCode=DelimitedTextMoreColumnsThanDefined,
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error found when processing 'Csv/Tsv Format Text'
source 'ROR_ACNTS_0101.RIB' with row number 192759: found more columns than expected column count 5.,Source=Microsoft.DataTransfer.Common,'",

Thank you in advance

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
6,342 questions
{count} votes

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 28,261 Reputation points Microsoft Employee
    2021-11-26T09:39:11.74+00:00

    Hi @Raffy Comendador ,

    Thank you for posting query in Microsoft Q&A Platform.

    Since you have column delimiter(|) even inside your data, ADF reads that data also as separate columns and giving error saying there are more columns than defined.

    To avoid this error you can either modify your upstream to get data in source file with out column delimiters(|) in side columns or kindly try to get your columns data wrapped in double quotes(") as shown below.

    "012312"|"Manila Phils"|"+631239098459"|"RaffyC"|"MyEmailAdd|@réalisations .com"
    "012432"|"Cebu Phils"|"+636549098459"|"Juan"|"Your|_|EmailAdd@réalisations .com"

    When your columns data wrapped in double quotes automatically if any column delimiters(|) inside that then they will get escaped.

    You can leverage Azure functions or some code which does this processing of correcting data first and then use that corrected inside Azure data factory.

    Hope this will help. Please let us know if any further queries. Thank you.

    -------------------------

    Please consider hitting Accept Answer if its helpful.