Hi Karthik Malla ,
Welcome to Microsoft Q&A platform and thanks for posting your question here.
As I understand your query, you are facing issue while loading json data that is greater than 2 GB in size using mapping dataflow in Azure data factory. Please let me know if my understanding is incorrect.
For better understanding, could you please confirm if you are running the dataflow using debug mode or trigger mode.
- When you execute a data flow activity in a pipeline in debug mode, you are not using the activity settings for compute. The "Run on" Azure IR setting the activity is only honored during pipeline triggered runs. Debug sessions using the Azure IR associated with your debug settings, not the activity IR.
- You can test your activity in pipeline debug mode with a larger compute size by either (a) using a larger Azure IR for your debug session, or (b) Click Debug > Use activity runtime. That will execute a true test of your data flow activity.
- This error could be caused because the cluster is running out of memory as stated in the error message.
- Debug clusters are meant for development. Use data sampling and an appropriate compute type and size to run the payload.
- For performance tips, see Mapping data flows performance and tuning guide which highlights various ways to tune and optimize your data flows so that they meet your performance benchmarks.
Error2: Job failed due to reason: Cluster ran into out of memory issue during execution. Also, Please note that the dataflow has one or more custom partitioning schemes.
Defining the file partitioning at the sink can be helpful. Kindly check this article which talks about splitting your large file across partitioned files so that you can process and move the file in pieces.
Additional resource:
Hope it helps. Please do consider clicking Accept Answer and Up-Vote for the same as accepted answers help community as well.