I am currently facing a challenge related to merging multiple JSON files stored in Blob storage. My goal is to combine these JSON files into a single, unified JSON output. To achieve this, I am contemplating the utilization of the Azure Data Factory Copy activity.
For context, here is an example of the JSON files I am dealing with:
test1.json:
{
"requestId": "5a30#18a2ca70781",
"result": [
{"id": 231581142, "activityTypeId": 10, "attributes": [{"name": "Step ID", "value": "1"}, {"name": "Choice Number", "value": "1"}]},
{"id": 231581142, "activityTypeId": 10, "attributes": [{"name": "Step ID", "value": "1"}, {"name": "Choice Number", "value": "1"}]}
],
"success": true,
"nextPageToken": "a",
"moreResult": true
}
test2.json:
{
"requestId": "5a30#18a2ca70781",
"result": [
{"id": 1, "activityTypeId": 10, "attributes": [{"name": "Step ID", "value": "2"}, {"name": "Choice Number", "value": "2"}]},
{"id": 3, "activityTypeId": 10, "attributes": [{"name": "Step ID", "value": "1"}, {"name": "Choice Number", "value": "7"}]}
],
"success": true,
"nextPageToken": "a",
"moreResult": true
}
test3.json:
{
"requestId": "5a30#18a2ca70781",
"result": [
{"id": 1, "activityTypeId": 10, "attributes": [{"name": "Step ID", "value": "2"}, {"name": "Choice Number", "value": "2"}]},
{"id": 3, "activityTypeId": 5, "attributes": [{"name": "Step ID", "value": "2"}, {"name": "Choice Number", "value": "9"}]}
],
"success": true,
"nextPageToken": "a",
"moreResult": true
}
My approach thus far involves using a wildcard path as the source to read these JSON files and designating a separate file named result.json as the sink location. In the copy activity settings, I've chosen the "merge file" behavior.
However, the resulting output is not a valid JSON format. The copy activity seems to treat each individual file as text and simply appends them together. Here's an example of the undesired output:
json
Copy code
{"requestId":"5a30#18a2ca70781","result":[...]} // JSON content from test1.json
{"requestId":"5a30#18a2ca70781","result":[...]} // JSON content from test2.json
{"requestId":"5a30#18a2ca70781","result":[...]} // JSON content from test3.json
I'm seeking advice and guidance on how to properly utilize the Azure Data Factory Copy activity to merge these JSON files into a correct and valid JSON format, where the content is unified under a single JSON structure.