Having problems with my mapping data flow with JSON as input

Jonatan 20 Reputation points


I previously posted this: https://learn.microsoft.com/en-us/answers/questions/1323602/having-problems-with-my-daily-load-of-data-through

Since I'm not getting any responses I'm doing a new thread with the latest update.

So we believe to have found the issue, namely that ADF when reading several JSON-files takes a sample of 1000 rows and then infer the schema from those, see image. If the columns does not appear in those 1000 rows, ADF seems to ignore the entire file that has those columns.Skärmavbild 2023-09-01 kl. 12.48.33

I have tried circumventing this by using a sample file that contains all my columns and then importing that schema as projection. I'm still getting the same issue though.

Azure SQL Database
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
1,949 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
7,994 questions
0 comments No comments
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 26,716 Reputation points Microsoft Employee

    Hi Jonatan ,

    Glad that you figured out the way to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to "Accept " the answer.

    Replacing Null-values with the dummy value of '-1' helped resolving the issue.

    Kindly accept the answer by clicking on Accept answer button so that the solution reaches the community in case anyone is facing similar issue. Thankyou.

    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Amira Bedhiafi 4,081 Reputation points

    The root of the problem appears to be ADF's mechanism of inferring the schema by sampling just 1000 rows. If any columns are absent in this sample, ADF seems to neglect files that include those columns. Despite attempting to bypass this issue by employing a sample file containing all necessary columns and importing its schema as a projection, the problem persists. This situation serves as a cautionary note for users of Azure SQL Database, Azure Blob Storage, and ADF when dealing with JSON datasets.

    0 comments No comments

  2. AnnuKumari-MSFT 26,716 Reputation points Microsoft Employee

    Hi Jonatan ,

    Thankyou for reaching out to Microsoft Q&A platform.

    Did you try checking the 'Allow schema drift' option in source and sink transformation?

    Schema drift is the case where your sources often change metadata. Fields, columns, and, types can be added, removed, or changed on the fly. Without handling for schema drift, your data flow becomes vulnerable to upstream data source changes

    Schema drift source

    It would be great if you could share the dataflow details here as screenshot so that it can give a better insight. Thankyou