Create a dataset that points to the location of your JSON data (you can choose Blob storage).
Use the Flatten transformation to unroll the nested arrays. Your JSON data has dates as keys, and the values are arrays. You'll need to flatten this structure.
After the Flatten transformation, you'll have an additional column which holds the date (the keys like "2023-02-06"). Rename this column to "date".
Now, you'll have a flattened table, but there might be additional columns based on the JSON structure (because not all objects have the same keys). You'll want to do a few things:
- Use the Select transformation to pick only the columns you want:
date
,breakdown
,pageviewsMinusExits
,rawViews
,pageTime
,timePerPageview
,standardViews
,entrances
. -Use the Derived Column transformation to ensure that missing columns get default values. For example, if "entrances" is missing for some rows, you'd specify a default value (like 0 or null). -Use Sort transformation to order the data by date or any other column if necessary. Then, create a dataset that points to the location where you want the output (like a table in SQL Database or a file in Blob storage).
- Source Dataset: Create a dataset that points to the location of your JSON data (e.g., Blob storage).
- Dataflow Transformation:
a. Use the Flatten transformation to unroll the nested arrays. Your JSON data has dates as keys, and the values are arrays. You'll need to flatten this structure.
b. After the Flatten transformation, you'll have an additional column which holds the date (the keys like "2023-02-06"). Rename this column to "date".
c. Now, you'll have a flattened table, but there might be additional columns based on the JSON structure (because not all objects have the same keys). You'll want to do a few things:
-Use the Select transformation to pick only the columns you want:
date
,breakdown
,pageviewsMinusExits
,rawViews
,pageTime
,timePerPageview
,standardViews
,entrances
.
-Use the Derived Column transformation to ensure that missing columns get default values. For example, if "entrances" is missing for some rows, you'd specify a default value (like 0 or null).
-Use Sort transformation to order the data by date or any other column if necessary.
- Pipeline: a. Drag and drop a
Copy Data
activity. b. Set the source to the dataset pointing to the JSON data. c. Set the transformation to the dataflow you defined. d. Set the sink to the output dataset. - Debug/Trigger: Once everything is set, you can debug the pipeline or trigger it to run. After the run, check the output location for the desired format.