Error 2200: Copy Activity - Partitioned Parquet Column Not Found

Mariah 136 Reputation points
2021-10-07T19:55:13.007+00:00

Things I've tried:

  • I have checked the pipeline's JSON and none of the columns have a space in them.
  • I added in the "physicalType: "UTF8" to the JSON as it was missing beforehand.
  • The schemas for the source dataset, sink dataset, and copy activity are all imported schemas - none of them are hardcoded.
  • I've tried unchecking and re-checking the "Enable Partition Discovery" on the Source dataset in the Copy Activity

Other things to note: the source is a blob container. The sink is a table in Azure SQL.

IMAGE 1: Error
https://ibb.co/qjQpZCW

IMAGE 2: Original Copy Activity Mapping
https://ibb.co/6YgBKzx

IMAGE 3: Updated Copy Activity Mapping after editing the JSON
https://ibb.co/P13KwVb

IMAGE 4: Added physicalType to JSON
https://ibb.co/SXpXM1p

IMAGE 5: Source File Structure
https://ibb.co/G2QV9Dc

I'm sure it's simple, but I've run into this error before and ended up dropping that column. But for this, I'd love to be able to keep the time_period column in this Copy Activity.

Thanks in advance!

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,427 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,525 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. HimanshuSinha-msft 19,376 Reputation points Microsoft Employee
    2021-10-08T19:21:54.067+00:00

    Hello @Mariah ,
    Thanks for the ask and using Microsoft Q&A platform .
    I think that when specifying the paraquet file path , you are using the the wildcard path . Can you check that please ?

    138955-image.png

    If thats the case there are files in the folder which are not having the column mentioned the error .

    If I were you I could have tried to use the filepath ( to test this logic ) and if filepath worked and could have looked into other files .

    Please do let me know how it goes .
    Thanks
    Himanshu

    -------------------------------------------------------------------------------------------------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

  2. Mariah 136 Reputation points
    2021-10-10T00:47:44.23+00:00

    UPDATE: I have figured out how to get time_period into Azure SQL DB (see below), but the entire column is NULL in Azure SQL DB. Still looking for help on this.

    ----------

    I figured it out! Whew - praise God!

    I had to remove the time_period (partition) column in the Mapping section of the Copy Activity. I realized that the source dataset did not have time_period (the partition column) column in it when I did "preview data" in the "Datasets" section of ADF (after clicking in the blue box - see image). So by removing the time_period column in the Mapping section of the Copy Activity, the Mapping section was in agreement with the source dataset's schema (blue box).

    I still have the "Enable partition discovery" box checked as well as the "Recursively" box checked. I did not specify the "Partition root path". Image

    Now when I do "Preview data" in the Copy Activity (green box), the partition (time_period) shows up and the Copy Activity runs successfully.

    139127-2021-10-09-20h44-34.png