I can't do multiple unroll in flatten activity synapse dataflow

Venkatesh S 5 Reputation points
2024-06-06T16:34:28.0266667+00:00

It was working quite well before, And started failing with "none.get" which is no where helping me to find the root cause, Even though I figured out that multiple flatten Unroll by cause the issue.

My source is multiple Nested JSON try to flatten them as a parquet file.

Attaching the snap for your reference

User's image

My flatten activity unroll by

User's image

This is how my Nested JSON looks like, I'm trying to get the value out from "itemID" and "merchandise hierarchy" both are complex JSON and If i try to include merchandise hierarchy its give me NONE result which is weird.

User's image

Please help me out here. Thanks!

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,612 questions
{count} vote

4 answers

Sort by: Most helpful
  1. Rahul Gosavi 166 Reputation points
    2024-06-12T06:35:55.88+00:00

    Flattening multiple nested JSON structures in Azure Synapse Data Flow can sometimes lead to issues, particularly when dealing with complex nested objects. The error message "none.get" is quite vague, but it often indicates that the process is trying to access a value that doesn't exist, possibly due to issues in handling nested structures.

    Here are a few steps and tips to help you troubleshoot and resolve the issue:

    1. Isolate the Problematic Flatten Step

    Start by identifying exactly where the error occurs. Simplify your data flow by flattening one level at a time and testing each step. This can help you pinpoint which specific flatten operation is causing the issue.

    1. Check for Null or Missing Values

    Ensure that the JSON structure you're working with doesn't have null or missing values at the levels you're trying to flatten. You can add a transformation to filter out or handle null values before flattening.

    1. Use a Derived Column Transformation

    Before the flattening step, use a Derived Column transformation to add some logic to handle potential null values. For example, you could replace nulls with a default value or log them for further inspection:

    iif(isNull(column_name), "default_value", column_name)
    
    1. Incremental Flattening

    Instead of flattening multiple levels in a single step, flatten one level and then output the intermediate result to a new JSON or Parquet file. Then, create a new data flow to flatten the next level. This incremental approach can make it easier to manage and debug.

    1. Schema Drift Handling

    Enable schema drift in your data flow to allow the flow to handle unexpected schema changes. This can sometimes resolve issues with missing fields during the flattening process.

    1. Debugging with Data Preview

    Use the Data Preview feature in Synapse Data Flow to inspect the intermediate results at each step. This can help you see what the data looks like after each transformation and before the flattening step.

    1. Validate JSON Structure

    Ensure that your JSON input is well-formed and consistent. Use JSON validation tools to check for errors or inconsistencies in the structure. This can help identify any issues with the data itself that might be causing the flattening to fail.

    Example Workflow

    Here’s a simplified example of a step-by-step approach:

    Initial Load:

    • Load the nested JSON into a staging table or directly into the Synapse Data Flow. First Flatten:
    1. Add a Flatten transformation for the first level.
    2. Inspect the results using Data Preview.

    Handle Nulls/Missing Values:

    • Add a Derived Column transformation to handle nulls or unexpected values.

    Second Flatten:

    • Add another Flatten transformation for the next level.
    • Again, inspect the results using Data Preview.

    Repeat as Necessary:

    • Continue flattening one level at a time, inspecting results and handling issues as they arise.

    Additional Resources

    Example Configuration

    Assuming you have a JSON structure like this:

    {
      "level1": {
        "level2": {
          "level3": {
            "field1": "value1",
            "field2": "value2"
          }
        }
      }
    }
    

    Your data flow configuration could be:

    Flatten Level 1:

    • Unroll by level1
      • Output fields: level1.level2
      Flatten Level 2:
    • Unroll by level2
    • Output fields: level2.level3 Flatten Level 3:
      • Unroll by level3
      • Output fields: field1, field2

    By following these steps and troubleshooting techniques, you should be able to isolate and resolve the issue causing the "none.get" error.

    0 comments No comments

  2. Rahul Gosavi 166 Reputation points
    2024-06-12T06:40:37.6066667+00:00

    Flattening multiple nested json structures in Azure Synapse


  3. BhargavaGunnam-MSFT 28,526 Reputation points Microsoft Employee
    2024-06-20T17:16:17.0933333+00:00

    Hello Venkatesh S

    The issue was due to a recent Spark version upgrade.

    The new version included updates to the flatten transformation logic to improve performance and functionality.

    These updates were not fully backward compatible with the previous version, leading to breaking changes.

    PG is aware of this issue and is working on a fix.

    To unblock you, could you please raise a support case so that a support engineer can assist you from the backend by reverting back the changes for your workspace.

    In case you don't have a support plan, please let me know so that I can provide a one-time free support request to work on this case.

    I am looking forward to hearing from you.


  4. José Miguel Lopez Becerra 6 Reputation points
    2024-06-26T15:21:20.32+00:00

    I am having the same issue. I have my pipelines ready which include DataFlow and flatteging XML files. To do so, I had to add two "unroll by" which was working perfectly until 1 month ago. Today I find out our this issue, and after the full morning troubleshooting, I found where the problem was and came to this post.

    It is very dissapointing that suddely our code stops working just because of some Ms upgrade.