differnce between synapse and databricks

Shivam Ramola 66 Reputation points
2021-10-12T08:45:54.76+00:00

What are major difference between synapse and data bricks
and under what circumstances one should use synapse and vice versa

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,921 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,178 questions
0 comments No comments
{count} votes

Accepted answer
  1. Tasadduq Burney 8,426 Reputation points MVP
    2021-10-12T09:30:00.807+00:00

    Hello @Shivam Ramola !

    Hope you are having a great day!
    Thank you for asking a Question! We are Glad to Assist you!

    Databricks:

    Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform. Designed with the founders of Apache Spark, Databricks is integrated with Azure to provide one-click setup, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.

    For more information on Databricks, please visit here :- https://learn.microsoft.com/en-us/azure/databricks/

    Synapse Analytics:

    Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs

    For more information on Synapse , please visit here :- https://learn.microsoft.com/en-us/azure/synapse-analytics/

    they do overlap to some extent, but they are not the same thing. Databricks is pretty much managed Apache Spark, whereas Synapse Analytics is managed SQL Data Warehouse.

    When to use Databricks and Synapse Analytics

    Machine Learning development – preferred: Databricks

    Has ML optimized Databricks runtimes which include some of the most popular libraries (e.g. TensorFlow, PyTorch, Keras etc.) and GPU enabled clusters
    managed and hosted version of MLflow is provided in Databricks with integrated enterprise security and some other Databricks-only capabilities
    you can use AzureML from Databricks
    support for GPUs
    tight version control integration (git) + CICD on full environments
    Synapse
    Built-in support for AzureML
    You can use open-source MLflow
    No full git experience or multi-user collaboration on notebook
    No full CICD yet on environment & dependencies
    Reflection: based on current available features, Databricks goes broader in ML features within Spark and gives a more comfortable developer experience (e.g. use of IDEs).

    Ad-hoc data lake discovery – both Synapse & Databricks

    Databricks – you can query data from the data lake by first mounting the data lake to your Databricks workspace and then use Python, Scala, R to read the data
    Synapse – you can use the SQL on-demand pool or Spark in order to query data from your data lake
    Reflection: we recommend to use the tool or UI you prefer. If you are a BI developer familiar with SQL & Synapse, Synapse is perfect; if you are a data scientists only using notebooks: use Databricks to discover your data lake.

    Real-time transformations – preferred: Databricks

    Databricks
    Spark Structured Streaming as part of Databricks is proven to work seamlessly (has extra features as part of the Databricks Runtime e.g. Z-order clustering when using Delta, join optimizations etc.)
    Autoloader – new functionality from Databricks allowing to incrementally
    Synapse
    As a data warehouse, we can ingest real-time data into Synapse using Stream analytics but this currently doesn’t support Delta. As a developer platform, Synapse doesn’t fully focus on real-time transformations yet.
    Reflection: Use Databricks if you want to use Spark’s Structured Streaming (and thus advanced transformations) and load real-time data into your delta lake.

    SQL Analyses & Data warehousing – preferred: Synapse

    Synapse
    A full data warehousing allowing to full relational data model, stored procedures, etc.
    Provides all SQL features any BI-er has been used to incl. a full standard T-SQL experience
    Brings together the best SQL technologies incl. columnar-indexing

    Databricks
    A delta-lake-based data warehouse is possible but not with the full width of SQL and data warehousing capabilities as a traditional data warehouse.
    Databricks leverages the Delta Lakehouse paradigm offering core BI functionalities but a full SQL traditional BI data warehouse experience.
    Doesn’t provide a full T-SQL experience (Spark SQL)
    Reporting and self-service BI – preferred: Synapse
    Synapse
    You can use Power BI directly from Synapse Studio
    The SQL pool (SQL DWH) is leader in enterprise data warehousing

    Regards,
    Tasadduq Burney

    __

    |- Please don't forget to "Upvote" and "Accept as answer" if the reply is helpful -|

    71 people found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.