You need 6 steps to achieve your goal :
- Create a Data Flow in Azure Data Factory. This allows you to handle transformations and combine different structures dynamically.
- Source Dataset:
- Define a dataset for each of your JSON files. You can either read from different file paths or multiple files from a folder.
- In the source transformation, choose JSON as the file type. This allows the schema to be read from the JSON files.
- Flatten the JSON (if necessary):
- If your JSON files have nested structures (like the "limit" object in your example), use the Flatten transformation to bring those nested objects to the top level.
- Schema Drift Handling:
- Enable Schema Drift in your source transformation. Schema drift allows ADF to handle data with dynamic or changing columns, meaning you don't need to manually define the columns. ADF will automatically handle any new or missing columns between the different JSON files.
- Enable "Allow schema drift" in the projection tab of the source transformation.
- Enable Schema Drift in your source transformation. Schema drift allows ADF to handle data with dynamic or changing columns, meaning you don't need to manually define the columns. ADF will automatically handle any new or missing columns between the different JSON files.
- Select Required Columns (Optional):
- If you only need specific columns from each JSON file, you can use the Select transformation to choose the columns. The schema drift option will ensure any columns missing from one file but present in another are handled.
- Sink Transformation:
- Finally, direct the output of the transformation to a sink dataset (for example a blob storage, database,...). The sink can be set to handle schema changes dynamically if you still don’t want to define columns.