I can't do multiple unroll in flatten activity synapse dataflow

Question

I can't do multiple unroll in flatten activity synapse dataflow

Venkatesh S 5

It was working quite well before, And started failing with "none.get" which is no where helping me to find the root cause, Even though I figured out that multiple flatten Unroll by cause the issue.

My source is multiple Nested JSON try to flatten them as a parquet file.

Attaching the snap for your reference

User's image

My flatten activity unroll by

User's image

This is how my Nested JSON looks like, I'm trying to get the value out from "itemID" and "merchandise hierarchy" both are complex JSON and If i try to include merchandise hierarchy its give me NONE result which is weird.

User's image

Please help me out here. Thanks!

phemanth 15,765 Reputation points Microsoft External Staff Moderator

2024-06-07T05:26:46.3133333+00:00
@Venkatesh S

Thanks for the question and using MS Q&A platform.

It seems you’re encountering an issue flattening a complex JSON structure in Azure Data Factory (ADF) using the Flatten transformation. Specifically, you’re getting a “none.get” error when trying to access the “merchandise hierarchy” element within your JSON.

Here are a couple of things to check that might be causing the issue:

Unroll order: The order in which you use the Unroll By option in the Flatten transformation can impact the outcome. Try adjusting the order in which you unroll the nested elements. In the image, it looks like you’re trying to unroll “posLog.transaction.inventoryTransaction” first, followed by “posLog.transaction.inventoryCount.itemCount.merchandiseHierarchy”.

Null value in path: It’s possible that the path you’re using to access “merchandise hierarchy” (“posLog.transaction.inventoryCount.itemCount.merchandiseHierarchy”) might contain a null value at some point in the hierarchy. Using the ‘get’ function like “none.get” will return None if the path leads to a null value.

Here are some suggestions for how to troubleshoot the issue:

Inspect the data: Use the Data Preview option in the ADF Flatten transformation to inspect the data at different stages of the transformation process. This will help you identify where the “null” value is being introduced.

Flatten one level at at time: Try simplifying the Unroll By option by only unrolling one level of the hierarchy at a time. This will help you isolate where the issue is occurring.

Use conditional expressions: You can use conditional expressions within the Unroll By option to handle null values. For instance, you could use an expression like coalesce(posLog.transaction.inventoryTransaction, defaultValue) to provide a default value if “posLog.transaction.inventoryTransaction” is null.

Hope this helps. Do let us know if you any further queries.
phemanth 15,765 Reputation points Microsoft External Staff Moderator

2024-06-10T04:50:23.2366667+00:00

@Venkatesh S We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
Venkatesh S 5 Reputation points

2024-06-10T10:45:56.76+00:00

@phemanth Hi, Thanks for your reply. I went through all your above mentioned steps, Still facing the same issue and I can tell you that MerchandiseHierarchy has value and it's not null and I can unroll the MerchandiseHierarchy if I make the "posLog.transaction" as root, I can't unroll "ItemID" level which is really weird.

I can't do multiple flatten as well, If I can unroll with 2 flatten activity will do the purpose. Unfortunately I can't see the option to unroll them. Can you help me to unroll the below column please?

I unroll all the rest of the columns and it's really weird to face this

The above image is unroll by below

"posLog.transaction.inventoryTransaction.inventoryCount.itemCount.itemID" as ROOT
phemanth 15,765 Reputation points Microsoft External Staff Moderator

2024-06-11T11:37:18.1833333+00:00
@Venkatesh S Here's a consolidated response incorporating the best aspects of the previous information and addressing potential issues:

Troubleshooting Steps:

Inspect Data for Null Values:

Use the Data Preview option in the Flatten transformation for each stage.

Look for null values in posLog, transaction, inventoryCount, itemCount, or merchandiseHierarchy across different records.

Iterative Flattening (One Level at a Time): Create separate Flatten transformations for each level of nesting:

First Flatten: Unroll by posLog.transaction to create a new column containing the entire transaction object.

Second Flatten (Optional): If transaction is further nested, unroll necessary inner elements within it.

Third Flatten: Unroll by inventoryCount.itemCount.merchandiseHierarchy (assuming these are within the transaction object). This will extract "itemID" and "merchandise hierarchy" into separate columns. Conditional Expressions (Handle Null Values):

If null values are unavoidable, use conditional expressions to provide default values.

In the Unroll By option for "merchandise hierarchy," use coalesce (or similar):
coalesce(inventoryCount.itemCount.merchandiseHierarchy, 'defaultValue')

Review Unroll Order:

While less likely the cause now, if you're using multiple Unroll By options within a single Flatten, experiment with different orders to see if it makes a difference.

Consider Sample Data and Error Messages:

If you can share a simplified sample of your nested JSON, it can help pinpoint the flattening strategy.

If available, provide more details about the "none.get" error message, which might indicate the specific null value location
phemanth 15,765 Reputation points Microsoft External Staff Moderator

2024-06-12T06:16:54.1333333+00:00

@Venkatesh S We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
Rajan Sethi 5 Reputation points

2024-06-17T14:59:01.69+00:00

Hi Microsoft,

Does this simply means that if two hierarchies are at same level under one parent hierarchy then it can't be flattened as mentioned in an example by @Venkatesh S. We too are facing the same issue. It was working fine in our environments untill first week of june and then it started failing at the Flatten Steps.. What changes did the Microsoft made ?
Garrett Locke 25 Reputation points

2024-06-20T12:40:20.1933333+00:00

I have the same problem here! My data flow was working perfectly for months and it started to fail on 29th May. I don't believe it's a data issue. Please can Microsoft help more here as, to me it looks like something has changed on your end. Once I try to unroll more than 3 arrays

I get this dreaded error :-)

Thanks Garrett

4 answers

Your answer

phemanth 15,765 Reputation points Microsoft External Staff Moderator

2024-06-10T04:50:23.2366667+00:00

@Venkatesh S We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
Venkatesh S 5 Reputation points

2024-06-10T10:45:56.76+00:00

@phemanth Hi, Thanks for your reply. I went through all your above mentioned steps, Still facing the same issue and I can tell you that MerchandiseHierarchy has value and it's not null and I can unroll the MerchandiseHierarchy if I make the "posLog.transaction" as root, I can't unroll "ItemID" level which is really weird.

I can't do multiple flatten as well, If I can unroll with 2 flatten activity will do the purpose. Unfortunately I can't see the option to unroll them. Can you help me to unroll the below column please?

I unroll all the rest of the columns and it's really weird to face this

The above image is unroll by below

"posLog.transaction.inventoryTransaction.inventoryCount.itemCount.itemID" as ROOT
phemanth 15,765 Reputation points Microsoft External Staff Moderator

2024-06-11T11:37:18.1833333+00:00

@Venkatesh S Here's a consolidated response incorporating the best aspects of the previous information and addressing potential issues:

Troubleshooting Steps:

Inspect Data for Null Values:

Use the Data Preview option in the Flatten transformation for each stage.

Look for null values in posLog, transaction, inventoryCount, itemCount, or merchandiseHierarchy across different records.

Iterative Flattening (One Level at a Time): Create separate Flatten transformations for each level of nesting:

First Flatten: Unroll by posLog.transaction to create a new column containing the entire transaction object.

Second Flatten (Optional): If transaction is further nested, unroll necessary inner elements within it.

Third Flatten: Unroll by inventoryCount.itemCount.merchandiseHierarchy (assuming these are within the transaction object). This will extract "itemID" and "merchandise hierarchy" into separate columns. Conditional Expressions (Handle Null Values):

If null values are unavoidable, use conditional expressions to provide default values.

In the Unroll By option for "merchandise hierarchy," use coalesce (or similar):
coalesce(inventoryCount.itemCount.merchandiseHierarchy, 'defaultValue')

Review Unroll Order:

While less likely the cause now, if you're using multiple Unroll By options within a single Flatten, experiment with different orders to see if it makes a difference.

Consider Sample Data and Error Messages:

If you can share a simplified sample of your nested JSON, it can help pinpoint the flattening strategy.

If available, provide more details about the "none.get" error message, which might indicate the specific null value location
phemanth 15,765 Reputation points Microsoft External Staff Moderator

2024-06-12T06:16:54.1333333+00:00

@Venkatesh S We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
Rajan Sethi 5 Reputation points

2024-06-17T14:59:01.69+00:00

Hi Microsoft,

Does this simply means that if two hierarchies are at same level under one parent hierarchy then it can't be flattened as mentioned in an example by @Venkatesh S. We too are facing the same issue. It was working fine in our environments untill first week of june and then it started failing at the Flatten Steps.. What changes did the Microsoft made ?
Garrett Locke 25 Reputation points

2024-06-20T12:40:20.1933333+00:00

I have the same problem here! My data flow was working perfectly for months and it started to fail on 29th May. I don't believe it's a data issue. Please can Microsoft help more here as, to me it looks like something has changed on your end. Once I try to unroll more than 3 arrays

I get this dreaded error :-)

Thanks Garrett

Answer 1

Flattening multiple nested JSON structures in Azure Synapse Data Flow can sometimes lead to issues, particularly when dealing with complex nested objects. The error message "none.get" is quite vague, but it often indicates that the process is trying to access a value that doesn't exist, possibly due to issues in handling nested structures.

Here are a few steps and tips to help you troubleshoot and resolve the issue:

Isolate the Problematic Flatten Step

Start by identifying exactly where the error occurs. Simplify your data flow by flattening one level at a time and testing each step. This can help you pinpoint which specific flatten operation is causing the issue.

Check for Null or Missing Values

Ensure that the JSON structure you're working with doesn't have null or missing values at the levels you're trying to flatten. You can add a transformation to filter out or handle null values before flattening.

Use a Derived Column Transformation

Before the flattening step, use a Derived Column transformation to add some logic to handle potential null values. For example, you could replace nulls with a default value or log them for further inspection:

iif(isNull(column_name), "default_value", column_name)

Incremental Flattening

Instead of flattening multiple levels in a single step, flatten one level and then output the intermediate result to a new JSON or Parquet file. Then, create a new data flow to flatten the next level. This incremental approach can make it easier to manage and debug.

Schema Drift Handling

Enable schema drift in your data flow to allow the flow to handle unexpected schema changes. This can sometimes resolve issues with missing fields during the flattening process.

Debugging with Data Preview

Use the Data Preview feature in Synapse Data Flow to inspect the intermediate results at each step. This can help you see what the data looks like after each transformation and before the flattening step.

Validate JSON Structure

Ensure that your JSON input is well-formed and consistent. Use JSON validation tools to check for errors or inconsistencies in the structure. This can help identify any issues with the data itself that might be causing the flattening to fail.

Example Workflow

Here’s a simplified example of a step-by-step approach:

Initial Load:

Load the nested JSON into a staging table or directly into the Synapse Data Flow. First Flatten:

Add a Flatten transformation for the first level.
Inspect the results using Data Preview.

Handle Nulls/Missing Values:

Add a Derived Column transformation to handle nulls or unexpected values.

Second Flatten:

Add another Flatten transformation for the next level.
Again, inspect the results using Data Preview.

Repeat as Necessary:

Continue flattening one level at a time, inspecting results and handling issues as they arise.

Additional Resources

Example Configuration

Assuming you have a JSON structure like this:

{
  "level1": {
    "level2": {
      "level3": {
        "field1": "value1",
        "field2": "value2"
      }
    }
  }
}

Your data flow configuration could be:

Flatten Level 1:

Unroll by level1
- Output fields: level1.level2
Flatten Level 2:
Unroll by level2
Output fields: level2.level3 Flatten Level 3:
- Unroll by level3
- Output fields: field1, field2

By following these steps and troubleshooting techniques, you should be able to isolate and resolve the issue causing the "none.get" error.

Answer 2

Rahul Gosavi 186

Flattening multiple nested json structures in Azure Synapse

Garrett Locke 25 Reputation points

2024-06-14T12:31:39.8333333+00:00

Hi :-), I am encountering the same issue and error. Several months ago, I created a data flow to flatten and unroll a complex JSON with multiple nested arrays. Everything was working perfectly, and my ADF pipeline was running smoothly. However, on May 29th, it suddenly stopped working.

I am fairly certain this is not related to missing or null values. I have tried changing the order in which I unroll the arrays but once I unroll a few levels I get the error every time (debugging the data flow with data preview).

I wonder if anything has been updated in Azure recently with respect to data flows/transformations?

Thanks

Garrett

Answer 3

Bhargava-MSFT 31,261 Microsoft Employee Moderator

Hello Venkatesh S

The issue was due to a recent Spark version upgrade.

The new version included updates to the flatten transformation logic to improve performance and functionality.

These updates were not fully backward compatible with the previous version, leading to breaking changes.

PG is aware of this issue and is working on a fix.

To unblock you, could you please raise a support case so that a support engineer can assist you from the backend by reverting back the changes for your workspace.

In case you don't have a support plan, please let me know so that I can provide a one-time free support request to work on this case.

I am looking forward to hearing from you.

Vinay K Chauhan 0 Reputation points

2024-06-26T13:12:25.66+00:00

Hi @BhargavaGunnam-MSFT,

I am facing a similar issue with flatten activity with multiple unroll .

None.get is the error I am also getting while doing the data preview after mapping the fields.

Please advise

Answer 4

José Miguel Lopez Becerra 21

I am having the same issue. I have my pipelines ready which include DataFlow and flatteging XML files. To do so, I had to add two "unroll by" which was working perfectly until 1 month ago. Today I find out our this issue, and after the full morning troubleshooting, I found where the problem was and came to this post.

It is very dissapointing that suddely our code stops working just because of some Ms upgrade.

Garrett Locke 25 Reputation points

2024-06-26T16:24:41.5266667+00:00

I followed @Bhargava-MSFT 's advice and asked Microsoft to fix the issue which they did immediately and all works perfectly now.
José Miguel Lopez Becerra 21 Reputation points

2024-06-26T17:08:14.21+00:00

The case at the beginning is about a JSON file. Mine is about XML flattening, but it is the same problem. My pipelines used to work 1 month ago, today they do not. After the whole morning trouble shooting, I came to the conclusion that the problem is about having "two" unroll by (again, this was ok 1 month ago). If Iet only 1 'unrollby' there is no problem on the execution.
Garrett Locke 25 Reputation points

2024-06-26T19:15:52.9166667+00:00

29th of May. It stopped working for me so looks like same issue alright. Hope you get it fixed soon.

Share via

I can't do multiple unroll in flatten activity synapse dataflow

4 answers

Your answer