ADF- Load 2 gb json file into blob

Karthik Malla 1 Reputation point
2023-01-05T18:02:01.48+00:00

@Annu Kumari @Boyina, Ramakrishna
I am trying to load 2 gb json file but facing below errors:
followed all the steps(flattening) and able to load 3 mb json file without errors.
Error1:
Cluster ran into out of memory issue during execution, please retry using an integration runtime with bigger core count and/or memory optimized compute type
Error2:
Job failed due to reason: Cluster ran into out of memory issue during execution. Also, Please note that the dataflow has one or more custom partitioning schemes.
The transformation(s) using custom partition schemes: source1.

  1. Please retry using an integration runtime with bigger core count and/or memory optimized compute type.
  2. Please retry using different partitioning schemes and/or number of partitions.
  3. Ensure the different values that fall under the same partition can fit within a 2GB limit.

can someone please help?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
{count} votes

1 answer

Sort by: Most helpful
  1. AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator
    2023-01-16T05:55:07.78+00:00

    Hi Karthik Malla ,

    Welcome to Microsoft Q&A platform and thanks for posting your question here.

    As I understand your query, you are facing issue while loading json data that is greater than 2 GB in size using mapping dataflow in Azure data factory. Please let me know if my understanding is incorrect.

    For better understanding, could you please confirm if you are running the dataflow using debug mode or trigger mode.

    • When you execute a data flow activity in a pipeline in debug mode, you are not using the activity settings for compute. The "Run on" Azure IR setting the activity is only honored during pipeline triggered runs. Debug sessions using the Azure IR associated with your debug settings, not the activity IR.
    • You can test your activity in pipeline debug mode with a larger compute size by either (a) using a larger Azure IR for your debug session, or (b) Click Debug > Use activity runtime. That will execute a true test of your data flow activity.

    Error 1: Cluster ran into out of memory issue during execution, please retry using an integration runtime with bigger core count and/or memory optimized compute type

    • This error could be caused because the cluster is running out of memory as stated in the error message.
    • Debug clusters are meant for development. Use data sampling and an appropriate compute type and size to run the payload.
    • For performance tips, see Mapping data flows performance and tuning guide which highlights various ways to tune and optimize your data flows so that they meet your performance benchmarks.

    Error2: Job failed due to reason: Cluster ran into out of memory issue during execution. Also, Please note that the dataflow has one or more custom partitioning schemes.

    Defining the file partitioning at the sink can be helpful. Kindly check this article which talks about splitting your large file across partitioned files so that you can process and move the file in pieces.

    filesplit2

    filesplit3

    Additional resource:

    https://stackoverflow.com/questions/69896806/df-executor-outofmemoryerror-in-synapse-pipeline.chttps://stackoverflow.com/questions/69896806/df-executor-outofmemoryerror-in-synapse-pipeline


    Hope it helps. Please do consider clicking Accept Answer and Up-Vote for the same as accepted answers help community as well.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.