how to split the json file based on size in the ADF

Anonymous
2023-08-21T13:21:39.0533333+00:00

I have more than 2mb json file. i need to split it into multiple json file using adf pipeline. like below example we need to split the json file

ex: {

"Id":12

"Name":"abc"

"Student":[

{

"SId":123

"SName":"xyz"

}

{

"SId":234

"SName":"qwe"

}

]

}

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,429 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,627 Reputation points Microsoft Employee
    2023-08-21T20:19:30.99+00:00

    @Anonymous Thanks for using Microsoft Q&A forum and posting your query.

    In order to split large sized files, you will have to use mapping data flow, so that even if you have a complex JSON structure with nested arrays, you will be able to flatten the JSON first and then created partitioned files in your sink based on the partition settings in your data flow sink transformation settings.

    User's image

    For detailed explanation, please refer to this video by community contributor: Azure Data Factory - Split/Partition big file to smaller ones using Mapping data flow

    Please note that this sample in the video is using csv/txt data but you can follow similar approach to achieve the same.

    Here is another resource regarding the same requirement: Split a json file based on number of records in ADF

    Hope this info helps.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.


  2. Anonymous
    2023-08-23T19:49:09.62+00:00

    how we can you spark(Azure Databricks Notebooks or Azure Synapse Notebooks) instead of ADF?


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.