Converting csv to parquet is not populating correct values in the columns in parquet file

Madugundu Somashekara, Roopa 46 Reputation points
2024-01-03T07:55:10.59+00:00

Hi Team,
I am trying to convert a simple .csv files which has lot of columns (~161 columns) into parquet file using mapping dataflow. All the columns show correct values when doing a 'Data Preview' but when the dataflow is run, the generated parquet file has drifted values in some of the columns which is wrong. Not sure what is happening on the destination side.
For destination setting, we are using 'Use Current Partition'
Could you help us understand where we are going wrong?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Vinodh247 34,661 Reputation points MVP Volunteer Moderator
    2024-01-03T11:15:39.0133333+00:00

    Hi Madugundu Somashekara, Roopa:

    Thanks for reaching out to Microsoft Q&A.

    I see a similar question from you has been answered by mark and kranthi. While that might not be the solution for this issue, it proves that csv doesn't maintain data types.

    Did you try the same options and still you received these value mismatch error?

    https://learn.microsoft.com/en-us/answers/questions/856132/converting-csv-to-parquet-format-with-correct-data

    Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.


  2. AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator
    2024-01-04T07:01:51.59+00:00

    Hi Madugundu Somashekara, Roopa ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your query here.

    As per my understanding you are trying to convert csv into parquet using dataflow in adf pipeline but not getting expected results .

    Could you please try using 'map drifted' option in the data preview tab .

    Go to the Data Preview tab and click Refresh to fetch a data preview. If data factory detects that drifted columns exist, you can click Map Drifted and generate a derived column that allows you to reference all drifted columns in schema views downstream.

    Screenshot shows the Data Preview tab with Map drifted called out.

    For more details, kindly check this documentation: Map drifted columns quick action

    Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.