Converting csv to parquet is not populating correct values in the columns in parquet file

Question

Converting csv to parquet is not populating correct values in the columns in parquet file

Madugundu Somashekara, Roopa 46

Hi Team,
I am trying to convert a simple .csv files which has lot of columns (~161 columns) into parquet file using mapping dataflow. All the columns show correct values when doing a 'Data Preview' but when the dataflow is run, the generated parquet file has drifted values in some of the columns which is wrong. Not sure what is happening on the destination side.
For destination setting, we are using 'Use Current Partition'
Could you help us understand where we are going wrong?

AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2024-01-08T06:12:35.73+00:00

Hi Madugundu Somashekara, Roopa ,

Just following up to see if the below answer helped. Please do consider clicking Accept Answer as accepted answers help community as well. Also, please click on Yes for the survey 'Was the answer helpful'

2 answers

Your answer

AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2024-01-08T06:12:35.73+00:00

Hi Madugundu Somashekara, Roopa ,

Just following up to see if the below answer helped. Please do consider clicking Accept Answer as accepted answers help community as well. Also, please click on Yes for the survey 'Was the answer helpful'

Answer 1

Vinodh247 34,661 MVP Volunteer Moderator

Hi Madugundu Somashekara, Roopa:

Thanks for reaching out to Microsoft Q&A.

I see a similar question from you has been answered by mark and kranthi. While that might not be the solution for this issue, it proves that csv doesn't maintain data types.

Did you try the same options and still you received these value mismatch error?

https://learn.microsoft.com/en-us/answers/questions/856132/converting-csv-to-parquet-format-with-correct-data

Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.

Madugundu Somashekara, Roopa 46 Reputation points

2024-01-03T12:04:53.8233333+00:00

Hi @Vinodh247
Yes, I did try that, but with no luck!
Vinodh247 34,661 Reputation points MVP Volunteer Moderator

2024-01-04T05:20:21.36+00:00

only specific columns or all columns has incorrect data in the parquet file?
Madugundu Somashekara, Roopa 46 Reputation points

2024-01-04T05:22:38.7833333+00:00

Only few columns have the issue.

Answer 2

Hi Madugundu Somashekara, Roopa ,

Thankyou for using Microsoft Q&A platform and thanks for posting your query here.

As per my understanding you are trying to convert csv into parquet using dataflow in adf pipeline but not getting expected results .

Could you please try using 'map drifted' option in the data preview tab .

Go to the Data Preview tab and click Refresh to fetch a data preview. If data factory detects that drifted columns exist, you can click Map Drifted and generate a derived column that allows you to reference all drifted columns in schema views downstream.

Screenshot shows the Data Preview tab with Map drifted called out.

For more details, kindly check this documentation: Map drifted columns quick action

Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou

Share via

Converting csv to parquet is not populating correct values in the columns in parquet file

2 answers

Your answer