How to go from raw to curated in ADLS ?
If at the raw level we have :
Do we have to keep all versions in the last level or to keep only the last view of the data ?
In case we keep only last version we will have -> Curated\CRM\Customer\crm_customer.parquet
The interest to keep a structure with no date will permit to linked directly the Synapse level to this curated space using the external object feature, isn't it ?
Or do you have another approach to manage this ?
Hello @Le Fur, Herve ,
Thanks for the question and using MS Q&A platform.
Just trying to get more clarity on the ask. Does the raw customer data from 2022_03_07.csv also contains the records from 2022_03_06.csv? But in general, it totally depends on the business requirement whether to keep the raw data for future use. To further simplify instead of having multiple files you may also consolidate and merge them to a single bulk file if all individual files have unique data. But if your consumer application is just dependent on that curated file as your mentioned
Curated\CRM\Customer\crm_customer.parquet, then it would be good to have it like you mentioned unless you have huge data that needs partitions.
Sorry for the late answer but I was in holiday ;-)
Your question : Does the raw customer data from 2022_03_07.csv also contains the records from 2022_03_06.csv?
Those 2 files are just 2 versions of the same data : One from 1 day (06) and the other from the day after (07)
Using a transformation we are able to get the target in curated.
So the question is : do we have to keep all historization in curated or just the last view of the dimension ?
Understood the second point of your answer for transactional data (fact tables) it is better to keep each files to have them as partitions, do I well understood ?
Sign in to comment