Data integration

RJ 106 Reputation points
2022-09-19T16:17:45.763+00:00

Hi there,

I'm seeking guidance on data integration.

I'm looking to integrate different sources which have different structures. For example: 2 companies merging. Their data is of same concept but each field could be named differently between those systems which convey the same meaning. Company A has company A customer data and company B has company B customer data under different field names. In additionboth of them may have additional non common information.

what is the best way to reduce ETL and data mapping but just do analytics on top of these disparate data.

Do I need to convert incoming data into json or xml or something else and use Azure synapse to read and transform that existing and future incoming data?

Like Query 1 company A data and union Query2 company B data?

in future there could be company C, D, E data as well.

Is there anything like schema less structure that would be better for such data integrations?

Just need a high level information. Would be helpful even if you give in detail links.

Thanks

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,368 questions
Azure Data Lake Analytics
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Nasreen Akter 10,736 Reputation points
    2022-09-20T13:23:48.2+00:00

    Hi @RJ ,

    Thank you for the ask.

    I think you could do:

    • convert the files (whatever was the source formats are e.g., json, csv, gzip etc) into delta parquet and save them into the SQL pool or curated zone (store structured data) in the DataLake for analysis
    • determine the CDM(Common Data Model) for the data and before creating the delta parquet, you have to apply CDM in the pipeline. For example if one company sends a field named 'item', the others 'B-item' , 'C-item' and so on for an item field, then you have to rename the field to some standard names defined in CDM, and then create the final file.

    Hope this helps. Thanks!