Hi @RJ ,
Thank you for the ask.
I think you could do:
- convert the files (whatever was the source formats are e.g., json, csv, gzip etc) into delta parquet and save them into the SQL pool or curated zone (store structured data) in the DataLake for analysis
- determine the CDM(Common Data Model) for the data and before creating the delta parquet, you have to apply CDM in the pipeline. For example if one company sends a field named 'item', the others 'B-item' , 'C-item' and so on for an item field, then you have to rename the field to some standard names defined in CDM, and then create the final file.
Hope this helps. Thanks!