json format arrays

Question

json format arrays

Victor Marcelo Hinojosa Solis 20

i have this json format getting from a api, but i don't know to save this data into a parquet file, because is a wrong json structure, its just arrays. thanks for your time regard

{
    "Name": [
        "",
        "Water level",
        "Velocity",
        "Discharge",
        "Area",
        "Temperature Cond",
        "Conductivity",
        "pH",
        "Bat. Volt",
        "Salinity",
        "Redox"
    ],
    "Unit": [
        "",
        "mm",
        "m/s",
        "L/s",
        "m³",
        "°C",
        "mS/cm",
        "pH",
        "V",
        "g/Kg",
        "mV"
    ],
    "ID": [
        "",
        1,
        2,
        3,
        4,
        5,
        6,
        7,
        8,
        9,
        10
    ],
    "Index": [
        "",
        1,
        2,
        3,
        4,
        5,
        6,
        7,
        8,
        9,
        10
    ],
    "Data": [
        [
            "2024-01-15T04:00:00",
            216,
            0.988,
            353.4,
            0.43,
            8.2309,
            664.76,
            4.5358,
            12.4,
            0.3563,
            349.02
        ],
        [
            "2024-01-15T04:10:00",
            221,
            1.003,
            367.57,
            0.44,
            8.1503,
            647.37,
            4.5387,
            12.4,
            0.3469,
            349.43
        ],
        [
            "2024-01-15T04:20:00",
            218,
            1.016,
            366.98,
            0.44,
            8.0708,
            631.36,
            4.5409,
            12.4,
            0.3384,
            349.85
        ],
        [
            "2024-01-15T04:30:00",
            219,
            1.009,
            366.22,
            0.44,
            7.969,
            617.72,
            4.5426,
            12.38,
            0.3311,
            350.18
        ],
        [
            "2024-01-15T04:40:00",
            216,
            1.016,
            363.42,
            0.43,
            7.9577,
            604.96,
            4.5434,
            12.38,
            0.3242,
            350.53
        ],
        [
            "2024-01-15T04:50:00",
            220,
            0.998,
            363.98,
            0.44,
            7.9153,
            597.72,
            4.5448,
            12.4,
            0.3203,
            350.87
        ],
        [
            "2024-01-15T05:00:00",
            217,
            0.998,
            358.73,
            0.43,
            7.837,
            595.51,
            4.5522,
            12.38,
            0.3191,
            350.51
        ]
    ]
}

Accepted answer

1 additional answer

Your answer

Answer 1

In the provided JSON structure, the data is arranged in a format that isn't directly convertible to a traditional columnar format, which might be why Copy Activity doesn't infer the schema correctly. Here's an approach using Azure Data Factory (ADF) Data Flow to process this file: Flatten the JSON: First, you need to flatten the JSON file. In Data Flow, use 'Parse' transformation for this. The output will be a single row with multiple array columns. Split into Multiple Rows: After flattening, split the single row into multiple rows. Use the Unpivot transformation in ADF Data Flow. You'll have to do this separately for each of the arrays ("Name", "Unit", "ID", "Index", "Data"). Combine Data: Once you have all the separate frames (one for each original array), you need to join them together based on the index of the element in the original array. Use Join transformation for this. Save as Parquet: Now that you have a traditional table-like DataFrame, you can save it as a Parquet file using 'Sink' operation.

Please note that the above method is one way to process this kind of data in ADF. Depending on the exact nature and size of your data, there may be other methods that could be more efficient or appropriate. Also, please be aware that debugging and developing data flows in ADF can be time-consuming due to the serverless nature of the service. Make sure to plan accordingly and test each step thoroughly. If you have further questions or if something is not clear, feel free to ask!

Answer 2

Vinodh247 34,661 MVP Volunteer Moderator

Hi Victor Marcelo Hinojosa Solis,

Thanks for reaching out to Microsoft Q&A.

If you have JSON files where multiple JSON documents are stored as a JSON array or line-delimited JSON files (where JSON documents are separated by new-line characters), you can use the OPENROWSET function to read the JSON content.
Unfortunately, there isn’t a direct built-in function to convert JSON to Parquet within Synapse Analytics currently.
You can try to use other tools or services (such as Azure Data Factory) to perform this conversion.

Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

Victor Marcelo Hinojosa Solis 20 Reputation points

2024-02-26T16:58:36.06+00:00

thanks for your answer Vinodh 247, but i mean to try convert to parquet trough copy data activity, the activity does not infer the schema because format json structure is not the correct. i wnat to know how i can processin this file from copy activity or data flow to extract the names and data. many thanks regards
Vinodh247 34,661 Reputation points MVP Volunteer Moderator

2024-02-27T06:48:39.3933333+00:00

did you try the column mappings to handle any discrepancies between the JSON and Parquet schemas? if not you can try in your Copy activity to create a custom mapping using the source and sink datasets. Also try to map the fields from the JSON source to the corresponding fields in the Parquet sink.

Share via

json format arrays

1 additional answer

Your answer