Load different json files

Vineet S 910 Reputation points
2024-09-09T11:51:32.6733333+00:00

How to copy 3 json files with different data without union as it has different columns... Without adding columns mannually due to many columns {

"id": 10000,

"id": null,

"data": {

"one": null,

"two": "",

"limit": {

  "currency": "USD",

  "limit": 0.0

},

"ok": 0.0

}

}

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,696 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 24,556 Reputation points
    2024-09-09T14:12:24.0666667+00:00

    You need 6 steps to achieve your goal :

    1. Create a Data Flow in Azure Data Factory. This allows you to handle transformations and combine different structures dynamically.
    2. Source Dataset:
      • Define a dataset for each of your JSON files. You can either read from different file paths or multiple files from a folder.
      • In the source transformation, choose JSON as the file type. This allows the schema to be read from the JSON files.
    3. Flatten the JSON (if necessary):
      • If your JSON files have nested structures (like the "limit" object in your example), use the Flatten transformation to bring those nested objects to the top level.
    4. Schema Drift Handling:
      • Enable Schema Drift in your source transformation. Schema drift allows ADF to handle data with dynamic or changing columns, meaning you don't need to manually define the columns. ADF will automatically handle any new or missing columns between the different JSON files.
        • Enable "Allow schema drift" in the projection tab of the source transformation.
    5. Select Required Columns (Optional):
      • If you only need specific columns from each JSON file, you can use the Select transformation to choose the columns. The schema drift option will ensure any columns missing from one file but present in another are handled.
    6. Sink Transformation:
      • Finally, direct the output of the transformation to a sink dataset (for example a blob storage, database,...). The sink can be set to handle schema changes dynamically if you still don’t want to define columns.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.