Error 2200: Copy Activity - Partitioned Parquet Column Not Found

Question

Things I've tried:

I have checked the pipeline's JSON and none of the columns have a space in them.
I added in the "physicalType: "UTF8" to the JSON as it was missing beforehand.
The schemas for the source dataset, sink dataset, and copy activity are all imported schemas - none of them are hardcoded.
I've tried unchecking and re-checking the "Enable Partition Discovery" on the Source dataset in the Copy Activity

Other things to note: the source is a blob container. The sink is a table in Azure SQL.

IMAGE 1: Error
https://ibb.co/qjQpZCW

IMAGE 2: Original Copy Activity Mapping
https://ibb.co/6YgBKzx

IMAGE 3: Updated Copy Activity Mapping after editing the JSON
https://ibb.co/P13KwVb

IMAGE 4: Added physicalType to JSON
https://ibb.co/SXpXM1p

IMAGE 5: Source File Structure
https://ibb.co/G2QV9Dc

I'm sure it's simple, but I've run into this error before and ended up dropping that column. But for this, I'd love to be able to keep the time_period column in this Copy Activity.

Thanks in advance!

Answer

Hello @Mariah ,
Thanks for the ask and using Microsoft Q&A platform .
I think that when specifying the paraquet file path , you are using the the wildcard path . Can you check that please ?

If thats the case there are files in the folder which are not having the column mentioned the error .

If I were you I could have tried to use the filepath ( to test this logic ) and if filepath worked and could have looked into other files .

Please do let me know how it goes .
Thanks
Himanshu

-------------------------------------------------------------------------------------------------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Answer

UPDATE: I have figured out how to get time_period into Azure SQL DB (see below), but the entire column is NULL in Azure SQL DB. Still looking for help on this.

----------

I figured it out! Whew - praise God!

I had to remove the time_period (partition) column in the Mapping section of the Copy Activity. I realized that the source dataset did not have time_period (the partition column) column in it when I did "preview data" in the "Datasets" section of ADF (after clicking in the blue box - see image). So by removing the time_period column in the Mapping section of the Copy Activity, the Mapping section was in agreement with the source dataset's schema (blue box).

I still have the "Enable partition discovery" box checked as well as the "Recursively" box checked. I did not specify the "Partition root path". Image

Now when I do "Preview data" in the Copy Activity (green box), the partition (time_period) shows up and the Copy Activity runs successfully.

Error 2200: Copy Activity - Partitioned Parquet Column Not Found

2 answers