json format arrays

Victor Marcelo Hinojosa Solis 20 Reputation points
2024-02-26T02:14:16.05+00:00

i have this json format getting from a api, but i don't know to save this data into a parquet file, because is a wrong json structure, its just arrays. thanks for your time regard

{
    "Name": [
        "",
        "Water level",
        "Velocity",
        "Discharge",
        "Area",
        "Temperature Cond",
        "Conductivity",
        "pH",
        "Bat. Volt",
        "Salinity",
        "Redox"
    ],
    "Unit": [
        "",
        "mm",
        "m/s",
        "L/s",
        "m³",
        "°C",
        "mS/cm",
        "pH",
        "V",
        "g/Kg",
        "mV"
    ],
    "ID": [
        "",
        1,
        2,
        3,
        4,
        5,
        6,
        7,
        8,
        9,
        10
    ],
    "Index": [
        "",
        1,
        2,
        3,
        4,
        5,
        6,
        7,
        8,
        9,
        10
    ],
    "Data": [
        [
            "2024-01-15T04:00:00",
            216,
            0.988,
            353.4,
            0.43,
            8.2309,
            664.76,
            4.5358,
            12.4,
            0.3563,
            349.02
        ],
        [
            "2024-01-15T04:10:00",
            221,
            1.003,
            367.57,
            0.44,
            8.1503,
            647.37,
            4.5387,
            12.4,
            0.3469,
            349.43
        ],
        [
            "2024-01-15T04:20:00",
            218,
            1.016,
            366.98,
            0.44,
            8.0708,
            631.36,
            4.5409,
            12.4,
            0.3384,
            349.85
        ],
        [
            "2024-01-15T04:30:00",
            219,
            1.009,
            366.22,
            0.44,
            7.969,
            617.72,
            4.5426,
            12.38,
            0.3311,
            350.18
        ],
        [
            "2024-01-15T04:40:00",
            216,
            1.016,
            363.42,
            0.43,
            7.9577,
            604.96,
            4.5434,
            12.38,
            0.3242,
            350.53
        ],
        [
            "2024-01-15T04:50:00",
            220,
            0.998,
            363.98,
            0.44,
            7.9153,
            597.72,
            4.5448,
            12.4,
            0.3203,
            350.87
        ],
        [
            "2024-01-15T05:00:00",
            217,
            0.998,
            358.73,
            0.43,
            7.837,
            595.51,
            4.5522,
            12.38,
            0.3191,
            350.51
        ]
    ]
}
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,847 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,539 questions
0 comments No comments
{count} votes

Accepted answer
  1. Debarchan Sarkar - MSFT 1,131 Reputation points Microsoft Employee
    2024-02-27T07:33:43.3066667+00:00

    In the provided JSON structure, the data is arranged in a format that isn't directly convertible to a traditional columnar format, which might be why Copy Activity doesn't infer the schema correctly. Here's an approach using Azure Data Factory (ADF) Data Flow to process this file: Flatten the JSON: First, you need to flatten the JSON file. In Data Flow, use 'Parse' transformation for this. The output will be a single row with multiple array columns. Split into Multiple Rows: After flattening, split the single row into multiple rows. Use the Unpivot transformation in ADF Data Flow. You'll have to do this separately for each of the arrays ("Name", "Unit", "ID", "Index", "Data"). Combine Data: Once you have all the separate frames (one for each original array), you need to join them together based on the index of the element in the original array. Use Join transformation for this. Save as Parquet: Now that you have a traditional table-like DataFrame, you can save it as a Parquet file using 'Sink' operation.

    Please note that the above method is one way to process this kind of data in ADF. Depending on the exact nature and size of your data, there may be other methods that could be more efficient or appropriate. Also, please be aware that debugging and developing data flows in ADF can be time-consuming due to the serverless nature of the service. Make sure to plan accordingly and test each step thoroughly. If you have further questions or if something is not clear, feel free to ask!

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Vinodh247 16,916 Reputation points
    2024-02-26T05:23:22.2033333+00:00

    Hi Victor Marcelo Hinojosa Solis,

    Thanks for reaching out to Microsoft Q&A.

    • If you have JSON files where multiple JSON documents are stored as a JSON array or line-delimited JSON files (where JSON documents are separated by new-line characters), you can use the OPENROWSET function to read the JSON content.
    • Unfortunately, there isn’t a direct built-in function to convert JSON to Parquet within Synapse Analytics currently.
    • You can try to use other tools or services (such as Azure Data Factory) to perform this conversion.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.