Synapse analytics for data ingestion

Samy Abdul 3,366 Reputation points
2021-07-14T12:23:05.987+00:00

Hi All,

I have seen a architecture which suggests Synapse Analytics for data ingestion and afterwards data is stored in ADLS Gen2 and again data pushed in to Synapse as well as Azure SQL database . Synapse used again third time and finally data being pushed in to reporting solution.

My questions are why and what purpose Synapse being used here given the fact it is expensive service , why not data being directly ingested in to ADLS?
and I am failing to truly understand the underlying purpose of second and third time usage of Synapse. To be honest, the design has left me quite confused and perplexed. Appreciate if someone could please provide detail answers so that I can get the clarity.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,251 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,338 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 75,131 Reputation points Microsoft Employee
    2021-07-15T05:33:27.513+00:00

    Hello @Samy Abdul ,

    Thanks for the question and using MS Q&A platform.

    This example workload shows several ways that SMBs can modernize legacy data stores and explore big data tools and capabilities, without overextending current budgets and skillsets. These end-to-end Azure data warehousing solutions integrate easily with Azure and Microsoft services and tools like Azure Machine Learning, Microsoft Power Platform, and Microsoft Dynamics.

    114830-image.png

    Load and Ingest: Azure Synapse Analytics pipelines ingest the legacy data warehouses into Azure.

    • The pipelines orchestrate the flow of migrated or partially refactored legacy databases and SSIS packages into Azure SQL Database. This lift-and-shift approach is fastest to implement, and offers a smooth transition from an on-premises SQL solution to an eventual Azure platform-as-a-service (PaaS). You can modernize databases incrementally after the lift and shift.
    • The pipelines can also pass unstructured, semi-structured, and structured data into Azure Data Lake Storage for centralized storage and analysis with other sources. Use this approach when fusing data provides more business benefit than simply replatforming the data.

    Store: The data can also enter the centralized Data Lake for further analysis, storage, and reporting.

    Process/Manipulate: Serverless analysis tools are available in the Azure Synapse Analytics workspace. These tools use serverless SQL pool or Apache Spark compute capabilities to process the data in Data Lake Storage. Serverless pools are available on demand, and don't require any provisioned resources.

    Collaborate/Consume: Power BI can query a semantic model stored in Analysis Services, or it can query Azure Synapse directly.

    Consume/Serve: Business analysts use Power BI reports and dashboards to analyze data and derive business insights.

    Why Synapse Analytics used three time in the architecture?

    114894-image.png

    First time: Used for Load and ingest data.

    Azure Synapse contains the same Data Integration engine and experiences as Azure Data Factory, allowing you to create rich at-scale ETL pipelines without leaving Azure Synapse Analytics.

    • Ingest data from 90+ data sources
    • Code-Free ETL with Data flow activities
    • Orchestrate notebooks, Spark jobs, stored procedures, SQL scripts, and more

    Example: Copying data from on-premise to Azure Data Lake gen2. The date contains two csv files one csv contains employee details and other contains department details.

    Second Time: Used for processing and manipulate the data.

    You can use Synapse SQL or Apache Spark.

    • Synapse SQL is a distributed query system for T-SQL that enables data warehousing and data virtualization scenarios and extends T-SQL to address streaming and machine learning scenarios.
    • Apache Spark for Azure Synapse deeply and seamlessly integrates Apache Spark--the most popular open source big data engine used for data preparation, data engineering, ETL, and machine learning.

    Example: By using the mapping data flow joined the both the csv files based on the primary key and then saved into the Azure Data Lake Gen2.

    Third Time: Link your Azure Synapse workspace to your new Power BI workspace and creating dataset and building report.

    Created a Power BI workspace, link your Azure Synapse workspace, and then create a Power BI data set that utilizes data in your Azure Synapse workspace.

    Example: I had created a power BI dataset utilizing the manipulated data which stored in the ADLS gen2 and then created a Power BI reports and dashboards to analyze data and derive business insights.

    The solution described in this article combines a range of Azure services that will ingest, store, process, enrich, and serve data and insights from different sources (structured, semi-structured, unstructured, and streaming).

    114788-image.png

    For more details, refer to Analytics end-to-end with Azure Synapse.

    Hope this helps. Do let us know if you any further queries.

    ---------------------------------------------------------------------------

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

    2 people found this answer helpful.

0 additional answers

Sort by: Most helpful