ADF - converting list of lists into a proper JSON format

Patryk Sobczak 0 Reputation points
2024-09-25T18:29:34.62+00:00

Hello,

I'm pretty new to ADF and I can't wrap my head around one case.

To keep it simple:

I have a source (REST API response), which theoretically is JSON but it doesn't have a "proper" JSON format.

It's supposed to be a simple table but it looks like that:

{
  "Fields": [
    "Region",
    "Market",
    "Country/Territory",
    "City",
    "2031"
  ],
  "Data": [
    [
      "Europe",
      "Central Europe",
      "Czech Republic",
      "Kvasiny",
      43323
    ],
    [
      "Europe",
      "Central Europe",
      "Czech Republic",
      "Mlada Boleslav",
      40224
    ],
    [
      "Europe",
      "Central Europe",
      "Hungary",
      "Esztergom",
      40416
    ],
    [
      "Europe",
      "Central Europe",
      "Hungary",
      "Kecskemet",
      0
    ]
  ]
}

Now, I'd like to use a Parquet file or at least a JSON file as a sink.

How can I approach this?

I was playing with mapping data flows but I'm not sure on how to proceed.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Sina Salam 22,031 Reputation points Volunteer Moderator
    2024-09-25T19:58:58.4666667+00:00

    Hello Patryk Sobczak,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you would like to manage the conversion and storage of your data using Azure Data Factory.

    Regarding your questions, using a Parquet or JSON File as a Sink required many steps however, the links below will guide you through:

    Also, use this link below to see more details on how to convert List of Lists into Proper JSON Format in ADF:

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


  2. phemanth 15,755 Reputation points Microsoft External Staff Moderator
    2024-09-27T11:04:01.97+00:00

    @Patryk Sobczak

    Thank you for posting query in Microsoft Q&A Platform.

    Let’s dive deeper into the specifics of transforming your list of lists into a proper JSON format using Azure Data Factory (ADF).

    Detailed Steps

    • Ensure your source dataset is correctly configured to read the JSON data from your REST API.
    • Create a new Data Flow in ADF.
    • Add a Source transformation and point it to your source dataset.
    • Add a Derived Column transformation to create a new column for each field in your “Fields” array. This will help in mapping the data correctly.
    • Example:JSON
        {
          "Region": byName("Data"),
          "Market": byName("Data"),
          "Country/Territory": byName("Data"),
          "City": byName("Data"),
          "2031": byName("Data")
        }
        
      
    • Add a Flatten transformation to flatten the “Data” array.
    • In the Flatten transformation, set the “Unroll by” to the “Data” array.
    • Add a Select transformation to map the flattened fields to the desired schema.
    • Rename the columns as needed to match your target schema.
    • Add a Sink transformation and configure it to write to a Parquet or JSON file.
    • Ensure the sink dataset is correctly configured to point to your desired storage location.

    Example JSON Data Flow

    Here’s an example of how your JSON data flow might look:

    JSON

    {
      "name": "FlattenJSONDataFlow",
      "properties": {
        "type": "MappingDataFlow",
        "typeProperties": {
          "sources": [
            {
              "name": "Source1",
              "dataset": {
                "referenceName": "YourSourceDataset",
                "type": "DatasetReference"
              }
            }
          ],
          "transformations": [
            {
              "name": "DerivedColumn1",
              "type": "DerivedColumn",
              "typeProperties": {
                "columns": {
                  "Region": "byName('Data')",
                  "Market": "byName('Data')",
                  "Country/Territory": "byName('Data')",
                  "City": "byName('Data')",
                  "2031": "byName('Data')"
                }
              }
            },
            {
              "name": "Flatten1",
              "type": "Flatten",
              "typeProperties": {
                "unrollBy": "Data"
              }
            },
            {
              "name": "Select1",
              "type": "Select",
              "typeProperties": {
                "select": [
                  {
                    "name": "Region",
                    "alias": "Region"
                  },
                  {
                    "name": "Market",
                    "alias": "Market"
                  },
                  {
                    "name": "Country/Territory",
                    "alias": "Country"
                  },
                  {
                    "name": "City",
                    "alias": "City"
                  },
                  {
                    "name": "2031",
                    "alias": "Year2031"
                  }
                ]
              }
            },
            {
              "name": "Sink1",
              "type": "Sink",
              "dataset": {
                "referenceName": "YourSinkDataset",
                "type": "DatasetReference"
              }
            }
          ]
        }
      }
    }
    
    

    Troubleshooting Tips

    • Check Data Types: Ensure that the data types of your columns are correctly set in the Derived Column transformation.
    • Debugging: Use the Data Flow Debug feature to preview the data at each transformation step.
    • Error Messages: Pay attention to any error messages or warnings in the ADF UI, as they can provide clues on what might be going wrong

    Hope this helps. Do let us know if you any further queries.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.