ADF unable to process large JSON file

Question

ADF unable to process large JSON file

Sathisha Bannihall 0

How can I fix the issue of ADF data flow failing to transform a 700MB JSON file with the error message "Operation on target Execute LIA failed: Operation on target LIA failed: {"StatusCode":"DF-Executor-OutOfMemoryError","Message":"Job failed due to reason: Cluster ran into out of memory issue during execution"?

phemanth 15,755 Reputation points Microsoft External Staff Moderator

2024-06-04T10:45:38.7066667+00:00

@Sathisha Bannihall We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

1 answer

Your answer

phemanth 15,755 Reputation points Microsoft External Staff Moderator

2024-06-04T10:45:38.7066667+00:00

@Sathisha Bannihall We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

@Sathisha Bannihall

Thanks for using MS Q&A platform and posting your query.

The error message indicates your ADF data flow is running out of memory while processing the 700MB JSON file. Here are some ways to address this:

Optimize Data Flow:

Reduce schema complexity: If the JSON schema is very complex with nested structures, consider simplifying it by flattening nested objects or selecting only relevant fields.
Filter data early: Use filter transformations early in the data flow to reduce the amount of data processed downstream.
Optimize transformations: Ensure your transformations are efficient and avoid unnecessary processing.

Increase Available Memory:

Increase integration runtime size: Azure Data Factory offers different integration runtime sizes. Consider scaling up to a larger size with more available memory.
Partition the data: Partition the JSON file into smaller chunks and process them independently. This reduces the memory footprint required for the entire file at once.
Use Data Factory copy activity: For simple copy operations, consider using the Data Factory copy activity instead of a data flow. This can be more efficient for large files.
Process data in chunks: If partitioning isn't feasible, explore processing the JSON file in smaller chunks using a loop or a script activity within your data pipeline.

Troubleshooting:

Enable data preview: Enable data preview in your data flow to inspect the data size and schema after each transformation. This helps identify bottlenecks and optimize accordingly.
Monitor resource utilization: Monitor your integration runtime resource utilization during execution to understand memory consumption.

Here are some additional resources that might be helpful:

Troubleshoot connector and format issues in mapping data flows: https://learn.microsoft.com/en-us/azure/data-factory/connector-troubleshoot-guide
ADF - Loop through a large JSON file in a dataflow: https://stackoverflow.com/questions/74099502/adf-loop-through-a-large-json-file-in-a-dataflowpen_spark

Hope this helps. Do let us know if you any further queries.

Share via

ADF unable to process large JSON file

1 answer

Your answer