load delta parquet files

arkiboys 9,686 Reputation points
2022-04-12T12:04:14.17+00:00

Hello,
At present, the dataflow in ADF loads data as delta parquet files into dlsgen2 daily as follows:
container/folderName/companyNameDetails1 --> contains the delta parquet files for this companyName
container/folderName/companyNameDetails2 --> contains the delta parquet files for this companyName
container/folderName/companyNameDetails3 --> contains the delta parquet files for this companyName

Then,once all data is loaded, I query the folderName and filter for required companyName, etc...

Question:
Is the above structure good or is it best to have one companyName folder and load all of the companyNameDetails into it.
for example:
container/folderName/companyNamesDetails to contain all the delta parquet files for all companies

Thanks

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,161 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. HimanshuSinha-msft 19,471 Reputation points Microsoft Employee
    2022-04-13T18:46:04.783+00:00

    Hello @arkiboys ,
    Thanks for the question and using MS Q&A platform.
    As we understand the ask here is to go with the folder structure container/folderName/companyNameDetails1 , container/folderName/companyNameDetails2 OR container/folderName/companyNamesDetails , please do let us know if its not accurate.

    You never mentioned about the data volume here , but i am guessing its not very huge . From storage perspective its not going to make any difference as to which option you take , its going to cost you the same . Now from querying perspective it will be easy to go with the option2 as you will not have to add the extra companyNameDetails1/2/3 , I could say you should go with Option 2 .

    Please do let me if you have any queries.
    Thanks
    Himanshu


    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
      • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.