Data integration

Question

Data integration

RJ 326

Hi there,

I'm seeking guidance on data integration.

I'm looking to integrate different sources which have different structures. For example: 2 companies merging. Their data is of same concept but each field could be named differently between those systems which convey the same meaning. Company A has company A customer data and company B has company B customer data under different field names. In additionboth of them may have additional non common information.

what is the best way to reduce ETL and data mapping but just do analytics on top of these disparate data.

Do I need to convert incoming data into json or xml or something else and use Azure synapse to read and transform that existing and future incoming data?

Like Query 1 company A data and union Query2 company B data?

in future there could be company C, D, E data as well.

Is there anything like schema less structure that would be better for such data integrations?

Just need a high level information. Would be helpful even if you give in detail links.

Thanks

1 answer

Your answer

Answer 1

Nasreen Akter 10,811 Volunteer Moderator

Hi @RJ ,

Thank you for the ask.

I think you could do:

convert the files (whatever was the source formats are e.g., json, csv, gzip etc) into delta parquet and save them into the SQL pool or curated zone (store structured data) in the DataLake for analysis
determine the CDM(Common Data Model) for the data and before creating the delta parquet, you have to apply CDM in the pipeline. For example if one company sends a field named 'item', the others 'B-item' , 'C-item' and so on for an item field, then you have to rename the field to some standard names defined in CDM, and then create the final file.

Hope this helps. Thanks!

RJ 326 Reputation points

2022-09-20T20:34:30.067+00:00

Thanks @Nasreen Akter . Both your points are new to my understanding. However Im exploring what are the advantage of converting all the source formats to delta parquet.

CDM is also new to me but at a glance, my initial understanding is that CDM is something like a precategorized field list thru which i could use to map the fields.

I'm exploring... but thanks for sharing

will keep this open for few more days and see if there are any more suggestions from others as well and then close it.

Thanks,
ShaikMaheer-MSFT 38,546 Reputation points Microsoft Employee Moderator

2022-09-27T16:05:26.677+00:00

Hi @RJ , Above points shared by @Nasreen Akter best suits for this requirement. Hope that helps you. If yes, please consider hitting Accept Answer button. Accepted answers help community as well. Please let us know if any further queries. Thank you.

Share via

Data integration

1 answer

Your answer