Building a modern datawarehouse (DL and Synapse)

Thomas McNeilly 20 Reputation points
2023-02-09T12:48:26.6666667+00:00

I'm a newbie in world of data warehousing and Microsoft Azure and I want to understand a basic question.

  • Say we have 2 operational databases, Finance and LOB, and the objective is to create a new datawarehouse.
  • So we have 2 db stores, and need populate db store for datawarehouse (DW). I'm thinking DW will need lots of the Finance and LOB data (for stage and transform say)

Question - does it make sense to use Data Lake (DL) to store the Finance and LOB data, before then storing in DW?

i.e. db store -> file store -> db store

*Note, I already have a need for DL for other source data.

Thanks

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
{count} votes

Answer accepted by question author
  1. Bhargava-MSFT 31,361 Reputation points Microsoft Employee Moderator
    2023-02-13T23:11:14.33+00:00

    Hello @Thomas McNeilly,

    Welcome to the MS Q&A platform.

    Yes, using Data lake to store your finance and lob data before storing it in DW makes sense. The reason is that data lake provides more flexibility and scalability in data processing and analysis.

    Data lakes are optimized for scaling to terabytes and petabytes of data. The data typically comes from multiple heterogeneous sources, and may be structured, semi-structured, or unstructured.
    The process you outlined, moving data from operational databases to a data lake, then transforming and loading it into the data warehouse, is a common approach in data warehousing and ETL process.

    You can use Azure Data Factory/Synapse to move data from the operational databases to Data Lake. Once the data is in Data Lake, you can use Azure Data Lake Analytics or Azure HDInsight to run data analysis jobs on the data. Afterward, you can use ADF/synapse to move the data from Data Lake to the data warehouse.

    In addition, data lake provides the benefits of data centralization, improved data quality, cost savings etc.

    The below architecture diagram demonstrates a data pipeline that integrates large amounts of data from multiple sources into a unified analytics platform in Azure.

    Potential use cases from the architecture are:

    Establish a data warehouse to be a single source of truth for your data.

    Integrate relational data sources with other unstructured datasets.

    Use semantic modeling and powerful visualization tools for simpler data analysis.

    User's image

    Reference documents:

    https://learn.microsoft.com/en-us/azure/architecture/example-scenario/data/data-warehouse

    https://learn.microsoft.com/en-us/azure/architecture/data-guide/scenarios/data-lake

    I hope this helps. Please let me know if you have any other specific questions here.

    2 people found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.