Azure Data Lake Gen2 - Use Case Advice

Eric Maibach 1

I am collecting weather data (history and forecast) from a third part web service. Since there will be a lot of data, and it will not have high use, I was planning to use Azure Data Lake Gen2 with blob storage, and storing the data in JSON files. My thought is that this will be cheaper than a Azure SQL database.

I have read that it is best to have larger files in Data Lake. The amount of data that is collected each hour is relatively small, so I was thinking of just having a file for each month. But this means when I collect data each hour I need to add to the current months file. What is the best way to do this? Should I read the file, add my new data to the data from the file, and then overwrite the file with the new data? That seems the easiest, but seems inefficient. Is there a better way to do this, so way to append? Or should I just live with have smaller files and create a new file each hour?

And, is this even an appropriate use case for Data Lake?

HimanshuSinha-msft 19,476 Reputation points Microsoft Employee

2020-11-02T18:09:37.383+00:00

Hello @Eric Maibach ,
Thanks for the ask and using the forum .

The thing which comes to my mind is , you can always write to the files hourly , doing that you will have have no read action only write action . You can always use the Merge option in Azure mapping dataflow which can merge smaller files and create a big file .

Let me know if you have any queries .

Thanks
Himanshu
HimanshuSinha-msft 19,476 Reputation points Microsoft Employee

2020-11-05T18:02:15.79+00:00

Hello ,

We have not heard back from you on this and was just following up ,if the issue is resolved .
In case if you have a resolution we request you to share that with the community .

Thanks
Himanshu

Share via

Azure Data Lake Gen2 - Use Case Advice

Your answer