Merging Multiple JSON Files Using Azure Data Factory Copy Activity

Bansal, Ankit Kumar 30 Reputation points
2023-08-25T14:56:50.2033333+00:00
I am currently facing a challenge related to merging multiple JSON files stored in Blob storage. My goal is to combine these JSON files into a single, unified JSON output. To achieve this, I am contemplating the utilization of the Azure Data Factory Copy activity.

 

For context, here is an example of the JSON files I am dealing with:

 

test1.json:

 

{
  "requestId": "5a30#18a2ca70781",
  "result": [
    {"id": 231581142, "activityTypeId": 10, "attributes": [{"name": "Step ID", "value": "1"}, {"name": "Choice Number", "value": "1"}]},
    {"id": 231581142, "activityTypeId": 10, "attributes": [{"name": "Step ID", "value": "1"}, {"name": "Choice Number", "value": "1"}]}
  ],
  "success": true,
  "nextPageToken": "a",
  "moreResult": true
}
test2.json:

 

{
  "requestId": "5a30#18a2ca70781",
  "result": [
    {"id": 1, "activityTypeId": 10, "attributes": [{"name": "Step ID", "value": "2"}, {"name": "Choice Number", "value": "2"}]},
    {"id": 3, "activityTypeId": 10, "attributes": [{"name": "Step ID", "value": "1"}, {"name": "Choice Number", "value": "7"}]}
  ],
  "success": true,
  "nextPageToken": "a",
  "moreResult": true
}
test3.json:

 

{
  "requestId": "5a30#18a2ca70781",
  "result": [
    {"id": 1, "activityTypeId": 10, "attributes": [{"name": "Step ID", "value": "2"}, {"name": "Choice Number", "value": "2"}]},
    {"id": 3, "activityTypeId": 5, "attributes": [{"name": "Step ID", "value": "2"}, {"name": "Choice Number", "value": "9"}]}
  ],
  "success": true,
  "nextPageToken": "a",
  "moreResult": true
}

 

My approach thus far involves using a wildcard path as the source to read these JSON files and designating a separate file named result.json as the sink location. In the copy activity settings, I've chosen the "merge file" behavior.

 

However, the resulting output is not a valid JSON format. The copy activity seems to treat each individual file as text and simply appends them together. Here's an example of the undesired output:

 

json
Copy code
{"requestId":"5a30#18a2ca70781","result":[...]} // JSON content from test1.json
{"requestId":"5a30#18a2ca70781","result":[...]} // JSON content from test2.json
{"requestId":"5a30#18a2ca70781","result":[...]} // JSON content from test3.json
I'm seeking advice and guidance on how to properly utilize the Azure Data Factory Copy activity to merge these JSON files into a correct and valid JSON format, where the content is unified under a single JSON structure.
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} vote

Accepted answer
  1. ShaikMaheer-MSFT 38,546 Reputation points Microsoft Employee Moderator
    2023-08-30T17:49:57.6166667+00:00

    Hi Bansal, Ankit Kumar,

    Thank you for posting query in Microsoft Q&A Platform.

    In Copy activity, under Sink tab, you should consider using File Pattern value as Array of Objects as shown in below image. Please check this documentation.

    User's image

    Hope this helps. Please let me know if any further queries.


    Please consider hitting Accept Answer button. Accepted answers help community as well.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.