Merging millions of json files from azure blob

Jardar Maatje 1 Reputation point
2020-09-14T18:29:53.223+00:00

Hi,

we have an Azure blob container with millions of small json files. I have successfully been able to setup a copy task in azure data factory for merging these files into one file that will be more manageable for further processing, preferably in data lake. Right now the destination is a CSV file in another container.

However this takes ages, so I need help in finding the most performant way to approach this.

So what is the most efficient way to do this? Both in terms of setup, but also i choosing sink and format.

In the end I would like to have the data stored in Azure Data Lake for analyis.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
6,285 questions
{count} votes