Data Flow transformation on json object with date key

Akshay Kumar Debta 0 Reputation points
2023-09-06T14:33:01.75+00:00
I'm extracting data from Rest API while i was transforming in dataflow I am not able to transform into correct order because in mapping the column are not showing properly. Kindly find below the mapping screenshot and data format.


Mapping
Data
{
    "2023-02-06": [
        {
            "breakdown": "101350281505",
            "exits": 1,
            "pageviewsMinusExits": 1,
            "exitsPerPageview": 0.5,
            "rawViews": 2,
            "pageTime": 815,
            "timePerPageview": 815.0,
            "standardViews": 2
        }
    ],
    "2023-02-10": [
        {
            "breakdown": "101350281505",
            "newVisitorRawViews": 1,
            "exits": 1,
            "pageviewsMinusExits": 0,
            "exitsPerPageview": 1.0,
            "rawViews": 1,
            "pageBounces": 1,
            "standardViews": 1,
            "entrances": 1,
            "pageBounceRate": 1.0
        }
    ],
    "2023-02-16": [
        {
            "breakdown": "101350281505",
            "pageviewsMinusExits": 5,
            "rawViews": 5,
            "pageTime": 3082,
            "timePerPageview": 616.4,
            "standardViews": 5,
            "entrances": 1
        }
    ],

how to convert above into correct format like this


date      |  breakdown     pageviewsMinusExits  rawViews pageTime  timePerPageview standardViews entrances
2023-02-10  101350281505         1                   1                 1               5            3
2023-02-16  101350281505         5                 3082               616.4            5            1




Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Amira Bedhiafi 33,071 Reputation points Volunteer Moderator
    2023-09-07T12:29:02.57+00:00

    Create a dataset that points to the location of your JSON data (you can choose Blob storage).

    Use the Flatten transformation to unroll the nested arrays. Your JSON data has dates as keys, and the values are arrays. You'll need to flatten this structure.

    After the Flatten transformation, you'll have an additional column which holds the date (the keys like "2023-02-06"). Rename this column to "date".

    Now, you'll have a flattened table, but there might be additional columns based on the JSON structure (because not all objects have the same keys). You'll want to do a few things:

    • Use the Select transformation to pick only the columns you want: date, breakdown, pageviewsMinusExits, rawViews, pageTime, timePerPageview, standardViews, entrances. -Use the Derived Column transformation to ensure that missing columns get default values. For example, if "entrances" is missing for some rows, you'd specify a default value (like 0 or null). -Use Sort transformation to order the data by date or any other column if necessary. Then, create a dataset that points to the location where you want the output (like a table in SQL Database or a file in Blob storage).
    1. Source Dataset: Create a dataset that points to the location of your JSON data (e.g., Blob storage).
    2. Dataflow Transformation: a. Use the Flatten transformation to unroll the nested arrays. Your JSON data has dates as keys, and the values are arrays. You'll need to flatten this structure. b. After the Flatten transformation, you'll have an additional column which holds the date (the keys like "2023-02-06"). Rename this column to "date". c. Now, you'll have a flattened table, but there might be additional columns based on the JSON structure (because not all objects have the same keys). You'll want to do a few things: -Use the Select transformation to pick only the columns you want: date, breakdown, pageviewsMinusExits, rawViews, pageTime, timePerPageview, standardViews, entrances.

    -Use the Derived Column transformation to ensure that missing columns get default values. For example, if "entrances" is missing for some rows, you'd specify a default value (like 0 or null).

    -Use Sort transformation to order the data by date or any other column if necessary.

    1. Pipeline: a. Drag and drop a Copy Data activity. b. Set the source to the dataset pointing to the JSON data. c. Set the transformation to the dataflow you defined. d. Set the sink to the output dataset.
    2. Debug/Trigger: Once everything is set, you can debug the pipeline or trigger it to run. After the run, check the output location for the desired format.

  2. AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator
    2023-09-12T08:36:23.8+00:00

    Hi Akshay Kumar Debta ,

    Welcome to Microsoft Q&A platform and thanks for posting your question here.

    As I understand your query, it seems you are trying to flatten data from your dataset which contains lots of date array having complex data within it. Please let me know if that's not the correct understanding.

    By looking at the schema of the Json that you provided, it doesn't look feasible to achieve. First of all, the schema is not consistent in all the array as you may notice 'timePerPageview' property is present in '2023-02-06'[] array, but not in '2023-02-10'[] array. Moreover, your requirement is to do this for 1000 date arrays . If the key name as well as the schema would be same for all the json , it would be feasible. But that's not the case which makes it complicated.

    Below is the way I tried to flatten one of the array:

    1. Use source transformation and point your dataset to the json file. Select 'Array of document' in the json setting. Although the date columns looks like array datatype, if you go to inspect tab , you would find that the datatype is complex and not array.

    User's image

    2.Use collect function in aggregate transformation to convert the complex datatype into array in order to flatten it.

    User's image

    3.Then use flatten transformation, to unroll by the columns

    User's image

    User's image

    So, the whole process needs to be done for all the array columns since the json schema is not consistent and key names are not matching.

    I would suggest you to check if json schema can be made consistent and valid. Or, else go for writing your own custom code using python , C# or java and use az function or batch activity to run the same.

    Hope it helps. Thankyou. Kindly accept the answer by clicking on Accept answer button.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.