What is the difference between : Azure Synapsis Analytics - Azure Databricks - Azure HD insight

CloudRock 366 Reputation points
2020-08-06T12:48:15.723+00:00

Hello Everybody,

I'm running a project where we need to propose an azure-based architecture to import data from an on-premises data warehouse (databases) to azure-based data platform.

Data are aimed to be exposed to company operators through a web visualization UI (with some analytics capabilities).

Data needs to be captured from on-premises, transformed from RAW to daily curated and exposed to end-user, using vizualisation tool.

I need your opinions to discuss benefits of using each of below services:

  • Azure Databricks OR Azure Synapsis Analytics OR Azure HD insight
  • Azure Analysis services
  • Azure SQL Database OR Azure Cosmos DB
  • Azure Data Factory OR Azure Data Migration Assistant
  • Azure Data Lake OR Azure Blob Storage

Here some constraints:

  • As compliance with modularity and service segregation principles, we need to separate jobs into different nodes.
  • Data migration will be performed by Azure Data Factory, but I need to know if Azure Data Migration assistant is also suitable in this case ?
  • Data should be stored in a resource that permit transformation, is Data Lake the best answer for this ?
  • For data transformation jobs from RAM to daily curated, would Databricks be the best solution for this ?
  • Would Azure Analysis Services be the best azure tool to interface with end-user visualization UI to perform some analytics ?
Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,338 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,363 questions
Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
198 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,917 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,529 questions
0 comments No comments
{count} votes

Accepted answer
  1. Vaibhav Chaudhari 38,576 Reputation points
    2020-08-06T14:13:10.377+00:00

    You have identified the Azure services almost correctly.

    If it is one time migration like lift and shift database from On-premise SQL or any other source to Azure SQL etc, you can use Data migration assistant - datamigration.microsoft.com

    In your case I think you will most probably have to use Azure data factory where you set up a data ingestion flow and schedule the daily runs.

    ADLS Gen2 is the resource where you can store the data and this data can be transformed

    Azure databricks can be used for data processing, AAS can be used to create tabular model which will later be connected to PBI for visualization

    Below is Modern data warehouse architecture proposed by Microsoft.

    Reference -
    modern-data-warehouse
    data-warehousing

    16050-image.png

    ===============================================

    If the response helped, do "Accept Answer" and upvote it -- Vaibhav

    2 people found this answer helpful.

2 additional answers

Sort by: Most helpful
  1. Tavares Junior, Ivan 36 Reputation points
    2021-03-03T23:57:24.967+00:00

    3-explore-azure-data-services-warehousing

    Compare Analysis Services with Synapse Analytics:

    Azure Analysis Services has significant functional overlap with Azure Synapse Analytics, but it's more suited for processing on a smaller scale.

    • Use Azure Synapse Analytics for:

    Very high volumes of data (multi-terabyte to petabyte sized datasets).
    Very complex queries and aggregations.
    Data mining, and data exploration.
    Complex ETL operations. ETL stands for Extract, Transform, and Load, and refers to the way in which you can retrieve raw data from multiple sources, convert this data into a standard format, and store it.
    Low to mid concurrency (128 users or fewer).

    • Use Azure Analysis Services for:

    Smaller volumes of data (a few terabytes).
    Multiple sources that can be correlated.
    High read concurrency (thousands of users).
    Detailed analysis, and drilling into data, using functions in Power BI.
    Rapid dashboard development from tabular data.

    • Combine Analysis Services with Synapse Analytics:

    Many scenarios can benefit from using Synapse Analytics and Analysis Services together. If you have large amounts of ingested data that require preprocessing, you can use Synapse Analytics to read this data and manipulate it into a model that contains business information rather than a large amount of raw data. The scalability of Synapse Analytics gives it the ability to process and reduce many terabytes of data down into a smaller, succinct dataset that summarizes and aggregates much of this data. You can then use Analysis Services to perform detailed interrogation of this information, and visualize the results of these inquiries with Power BI.

    7 people found this answer helpful.
    0 comments No comments

  2. Andrei Calin Juganaru 1 Reputation point
    2021-08-12T18:40:38.117+00:00

    @CloudRock I guess you had some time to experiment/decide. I am interested as well in one of your questions:
    Azure Databricks OR Azure Synapse Analytics OR Azure HD insight?

    I understand the difference between Analysis Services and Synapse Analytics.
    However, it is still unclear to me what is the best choice between Databricks vs. Synapse vs. HD Insight.

    Any input is highly appreciated! Thanks!

    0 comments No comments