ADF unable to process large JSON file

Sathisha Bannihall 0 Reputation points
2024-06-03T14:41:24.3766667+00:00

How can I fix the issue of ADF data flow failing to transform a 700MB JSON file with the error message "Operation on target Execute LIA failed: Operation on target LIA failed: {"StatusCode":"DF-Executor-OutOfMemoryError","Message":"Job failed due to reason: Cluster ran into out of memory issue during execution"?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,923 questions
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 7,510 Reputation points Microsoft Vendor
    2024-06-03T16:22:39.32+00:00

    @Sathisha Bannihall

    Thanks for using MS Q&A platform and posting your query.

    The error message indicates your ADF data flow is running out of memory while processing the 700MB JSON file. Here are some ways to address this:

    Optimize Data Flow:

    • Reduce schema complexity: If the JSON schema is very complex with nested structures, consider simplifying it by flattening nested objects or selecting only relevant fields.
    • Filter data early: Use filter transformations early in the data flow to reduce the amount of data processed downstream.
    • Optimize transformations: Ensure your transformations are efficient and avoid unnecessary processing.

    Increase Available Memory:

    • Increase integration runtime size: Azure Data Factory offers different integration runtime sizes. Consider scaling up to a larger size with more available memory.
    • Partition the data: Partition the JSON file into smaller chunks and process them independently. This reduces the memory footprint required for the entire file at once.
    • Use Data Factory copy activity: For simple copy operations, consider using the Data Factory copy activity instead of a data flow. This can be more efficient for large files.
    • Process data in chunks: If partitioning isn't feasible, explore processing the JSON file in smaller chunks using a loop or a script activity within your data pipeline.

    Troubleshooting:

    • Enable data preview: Enable data preview in your data flow to inspect the data size and schema after each transformation. This helps identify bottlenecks and optimize accordingly.
    • Monitor resource utilization: Monitor your integration runtime resource utilization during execution to understand memory consumption.

    Here are some additional resources that might be helpful:

    Hope this helps. Do let us know if you any further queries.

    0 comments No comments