How to flatten a json with a list of jsons nested with Data Factory?

Víctor Díaz Cortés 20 Reputation points
2023-07-21T22:25:05.65+00:00

Hi everyone!

I've been working on a JSON file in Data Factory, and I'm having trouble properly flattening it. Here is a sample of the JSON file:

[
  {
    "_id": "fa8e22d6-c0b6-5229-bb9e-ad52eda39a0a",
    "actual_price": "2,999",
    "average_rating": "3.9",
    "brand": "York",
    "category": "Clothing and Accessories",
    "crawled_at": "02/10/2021, 20:11:51",
    "description": "Yorker trackpants made from 100% rich combed cotton giving it a rich look.Designed for Comfort,Skin friendly fabric,itch-free waistband & great for all year round use Proudly made in India",
    "discount": "69% off",
    "images": [
      "https://rukminim1.flixcart.com/image/128/128/jr3t5e80/track-pant/z/y/n/m-1005combo2-yorker-original-imafczg3xfh5qqd4.jpeg?q=70",
      "https://rukminim1.flixcart.com/image/128/128/jr58l8w0/track-pant/w/d/a/l-1005combo8-yorker-original-imafczg3pgtxgraq.jpeg?q=70"
    ],
    "out_of_stock": false,
    "pid": "TKPFCZ9EA7H5FYZH",
    "product_details": [
      {
        "Style Code": "1005COMBO2"
      },
      {
        "Closure": "Elastic"
      },
      {
        "Pockets": "Side Pockets"
      },
      {
        "Fabric": "Cotton Blend"
      },
      {
        "Pattern": "Solid"
      },
      {
        "Color": "Multicolor"
      }
    ],
    "seller": "Shyam Enterprises",
    "selling_price": "921",
    "sub_category": "Bottomwear",
    "title": "Solid Men Multicolor Track Pants",
    "url": "https://www.flipkart.com/yorker-solid-men-multicolor-track-pants/p/itmd2c76aadce459?pid=TKPFCZ9EA7H5FYZH&lid=LSTTKPFCZ9EA7H5FYZHVYXWP0&marketplace=FLIPKART&srno=b_1_1&otracker=browse&fm=organic&iid=177a46eb-d053-4732-b3de-fcad6ff59cbd.TKPFCZ9EA7H5FYZH.SEARCH&ssid=utkd4t3gb40000001612415717799"
  }]

Everything is pretty straightforward except for the "product_details" section, which contains a list of several JSON objects with different pieces of data each. Unfortunately, the tools available in Data Factory, like Flatten and Select, only recognize the first object in the "product_details" array.

    "product_details": [
      {
        "Style Code": "1005COMBO2"
      },
      {
        "Closure": "Elastic"
      },
      {
        "Pockets": "Side Pockets"
      },
      {
        "Fabric": "Cotton Blend"
      },
      {
        "Pattern": "Solid"
      },
      {
        "Color": "Multicolor"
      }
    ]

I'm looking for the best approach to process this kind of JSON files efficiently. How can I properly handle the "product_details" array with all its objects in Data Factory?

Thank you for your help!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,225 questions
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 46,602 Reputation points Microsoft Employee
    2023-07-24T23:44:04.16+00:00

    @Víctor Díaz Cortés Welcome to Microsoft Q&A forum and thanks for reaching out here.

    In order to flatten product details, in Mapping data flow source transformation, first you will have to import the projection that way all the objects under product_details are populated as shown below:

    User's image

    Then click on data preview and see the details are populated for each object of product_details.

    productDetailsFlatten

    Then, from have a flatten transformation and for Unroll by select both top root {} and product_details and in the input column section click Add mapping -> Fixed mapping then in the columns section you will be able to select all the flatten columns as shown below:

    User's image

    User's image

    Hope this info helps. Let me know if you have any questions.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.