ADF- Load 2 gb json file into blob

Question

ADF- Load 2 gb json file into blob

Karthik Malla 1

@Annu Kumari @Boyina, Ramakrishna
I am trying to load 2 gb json file but facing below errors:
followed all the steps(flattening) and able to load 3 mb json file without errors.
Error1:
Cluster ran into out of memory issue during execution, please retry using an integration runtime with bigger core count and/or memory optimized compute type
Error2:
Job failed due to reason: Cluster ran into out of memory issue during execution. Also, Please note that the dataflow has one or more custom partitioning schemes.
The transformation(s) using custom partition schemes: source1.

Please retry using an integration runtime with bigger core count and/or memory optimized compute type.
Please retry using different partitioning schemes and/or number of partitions.
Ensure the different values that fall under the same partition can fit within a 2GB limit.

can someone please help?

AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator

2023-01-18T06:17:23.99+00:00

Hi Karthik Malla ,

Just checking in to see if the below answer helped. Please do consider clicking Accept Answer and Up-Vote for the same as accepted answers help community as well. If you have any further query do let us know.
AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator

2023-01-24T06:22:11.77+00:00

Hi Karthik Malla ,

Just following up to see if the below answer helped. Please do consider clicking Accept Answer and Up-Vote for the same as accepted answers help community as well. If you have any further query do let us know.

1 answer

Your answer

AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator

2023-01-18T06:17:23.99+00:00

Hi Karthik Malla ,

Just checking in to see if the below answer helped. Please do consider clicking Accept Answer and Up-Vote for the same as accepted answers help community as well. If you have any further query do let us know.
AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator

2023-01-24T06:22:11.77+00:00

Hi Karthik Malla ,

Just following up to see if the below answer helped. Please do consider clicking Accept Answer and Up-Vote for the same as accepted answers help community as well. If you have any further query do let us know.

Answer 1

Hi Karthik Malla ,

Welcome to Microsoft Q&A platform and thanks for posting your question here.

As I understand your query, you are facing issue while loading json data that is greater than 2 GB in size using mapping dataflow in Azure data factory. Please let me know if my understanding is incorrect.

For better understanding, could you please confirm if you are running the dataflow using debug mode or trigger mode.

When you execute a data flow activity in a pipeline in debug mode, you are not using the activity settings for compute. The "Run on" Azure IR setting the activity is only honored during pipeline triggered runs. Debug sessions using the Azure IR associated with your debug settings, not the activity IR.
You can test your activity in pipeline debug mode with a larger compute size by either (a) using a larger Azure IR for your debug session, or (b) Click Debug > Use activity runtime. That will execute a true test of your data flow activity.

Error 1: Cluster ran into out of memory issue during execution, please retry using an integration runtime with bigger core count and/or memory optimized compute type

This error could be caused because the cluster is running out of memory as stated in the error message.
Debug clusters are meant for development. Use data sampling and an appropriate compute type and size to run the payload.
For performance tips, see Mapping data flows performance and tuning guide which highlights various ways to tune and optimize your data flows so that they meet your performance benchmarks.

Error2: Job failed due to reason: Cluster ran into out of memory issue during execution. Also, Please note that the dataflow has one or more custom partitioning schemes.

Defining the file partitioning at the sink can be helpful. Kindly check this article which talks about splitting your large file across partitioned files so that you can process and move the file in pieces.

filesplit2

filesplit3

Additional resource:

https://stackoverflow.com/questions/69896806/df-executor-outofmemoryerror-in-synapse-pipeline.chttps://stackoverflow.com/questions/69896806/df-executor-outofmemoryerror-in-synapse-pipeline

Hope it helps. Please do consider clicking Accept Answer and Up-Vote for the same as accepted answers help community as well.

Manisha Barnwal 0 Reputation points Microsoft Employee

2024-03-01T03:37:51.1633333+00:00

I'm facing a similar issue where originally my files were ~200MB each but now they are 3 files into >10GB per file. The total size of the folder is the same but ADF refuses to process the folder giving exception. My exception is very unrelated but goes away when I switch to smaller files. How to fix it?

Share via

ADF- Load 2 gb json file into blob

1 answer

Your answer