In ADF, dataflow activity, Ingest JSON file and structure from source, transform values, sink to destination with same JSON structure as source but with updated transformed values (Dynamic schema)

Wardie;) 0 Reputation points
2023-05-24T15:13:05.9766667+00:00

Hey there!

I'll try and keep this as simple as possible!

In ADF and a dataflow activity, its possible to dynamically ingest JSON files of various hierarchical structures from source location. Transform various columns (i.e. Anonymisation of data), and then sink the JSON file to destination using the same source JSON file structure it was ingested on but with the new updated/transformed values?

The workaround I've implemented currently is an uber-schema to cover off eventualities in terms of mapping. However, doing so means the output to sink is following a fixed and explicit schema which means lots of NULL/Empty values and objects.

I can't go down the avenue of flattening the file either because it means I have to re-roll the objects back up again at sink.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} votes

1 answer

Sort by: Most helpful
  1. AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator
    2023-05-29T12:59:03.3966667+00:00

    Hi Wardie;) ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your question here.

    As per my understanding, you want to understand if Dataflow will be able to handle the change in source data with updated values and reflect the same in sink or not.

    I assume that when you say source has transformed data, it means mean not only the values are updated in source but the source schema is changed as well? Please correct if my understanding is wrong.

    In such cases, you can levergae the Allow Schema drift option present in both source and sink transformation.

    • When schema drift is enabled, all incoming fields are read from your source during execution and passed through the entire flow to the Sink.
    • By default, all newly detected columns, known as drifted columns, arrive as a string data type. If you wish for your data flow to automatically infer data types of drifted columns, check Infer drifted column types in your source settings.

    To enable schema drift, check Allow schema drift in your source transformation.

    Schema drift source

    For more details, kindly check this documentation: Schema drift in mapping data flow

    Hope it helps. Please let us know if the suggestion helped in resolving your query. Thankyou


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.