Additonal column only in out in sink when Parquet in sink type?

Sudarshan Kumar 20 Reputation points
2024-02-13T08:35:58.21+00:00

Hi In my copy data activity I am adding one additonal column along with the query output result column . I have 15 columns + one Additional column . When i use CSV output i am getting correct result where in output i see 15 +1 additonal column along with its values . But when i change sink type as parquet then in the output i see only additional column values and rest 15 column values is blank . Am i missing something here ? Please help

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,365 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Amira Bedhiafi 30,501 Reputation points
    2024-02-13T10:08:51.0433333+00:00

    Parquet files are more kind of rigid in terms of schema, unlike CSV files. Make sure that the schema created (or inferred) during the Copy Data step for the Parquet output file matches precisely with the data types and column names you expect. Omission of any item may cause trouble with writing data correctly.

    In the Copy Data activity, paying particular attention to the case when the source is extended with one additional column, the source to sink column mapping has to be correct set-up. This also covers the other section. For parquet sinks, make sure the mapping is explicitly included for all 16 columns and that the data types are compatible.

    If available, please utilize the preview feature of Azure Data Factory to look into the output of the source dataset as well as the transformation/query output before it is written to the sink. Note: This feature is only available in ADF source datasets. This can also help know if the problem exists before the data sent to the sink. https://learn.microsoft.com/en-us/azure/data-factory/connector-troubleshoot-parquet

    1 person found this answer helpful.

  2. AnnuKumari-MSFT 34,361 Reputation points Microsoft Employee
    2024-02-19T11:08:55.46+00:00

    Hi Sudarshan Kumar , Thankyou for using Microsoft Q&A platform and thanks for posting your query here.

    It seems that you are facing an issue with the Parquet sink type while trying to add an additional column to the query output result column.

    When you change the sink type from CSV to Parquet in your Copy Data activity, the schema of the output data may change. Parquet is a columnar storage format that is optimized for query performance, and it may handle data types differently than CSV.

    You could try loading the data to csv first, and then later use another copy activity to copy data from csv to parquet format. Hope it helps. Kindly let us know how it goes. Please feel free to accept the answer by clicking on Accept answer button in case you find the solution helpful. Thankyou

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.