KranthiPakala-MSFT avatar image
0 Votes"
KranthiPakala-MSFT asked panlondon1-2476 commented

Differences between HD Insight and Azure Data bricks?

I know that HDInsight has several types of clusters whereas Databricks is only for Spark type of cluster. I believe there must be some significant differences which will influence what to be chosen for implementation.

[Note: As we migrate from MSDN, this question has been posted by an Azure Cloud Engineer as a frequently asked question] Source: MSDN

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

HimanshuSinhamfst-5269 avatar image
6 Votes"
HimanshuSinhamfst-5269 answered panlondon1-2476 commented

Welcome to the Microsoft Q&A (Preview) platform.

Happy to answer your questions

Azure HDInsight is a cloud distribution of the Hadoop components from the Hortonworks Data Platform (HDP). Azure HDInsight makes it easy, fast, and cost-effective to process massive amounts of data. You can use the most popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R, and more. For more details, refer to Azure HDInsight Documentation.

Azure HDInsight brings both Hadoop and Spark under the same umbrella and enables enterprises to manage both using the same set of tools e.g. using Ambari, Apache Ranger etc. It also offers industry standard notebook experience with support for both Jupyter and Zeppelin notebooks. Enterprises that want this ease of manageability across all their big data workloads can choose to use HDInsight.

Azure Databricks is a premium Spark offering that is ideal for customers who want their data scientists to collaborate easily and run their Spark based workloads efficiently and at industry leading performance.

Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform. For more details, refer to Azure Databricks Documentation.

Here is the comparison on Azure HDInsight vs Databricks.

For more details, refer MSDN thread which addressing similar question.

Hope this helps.

Sourced from MSDN – Azure HDInsight

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @HimanshuSinhamfst-5269
Thank you for your response. I gives more clarity.
In my organization, we are heavily using Spark HDInsight cluster for run predictive batch jobs which have 40 nodes and runs for 4 hours. We would like to see if Databricks framework/ cluster provides better performance and finish job in faster way.

Do you have any recommendation in terms of performance comparison or if there is benchmark comparison available ?

0 Votes 0 ·
BhavaniPJarajapu-3045 avatar image BhavaniPJarajapu-3045 KumarGauravAdminAccount-9183 ·

If you only need a spark cluster, Databricks provides better performance than HDInsight. If your batch jobs are really log running with high power requirements then HDInsight is a better option.

1 Vote 1 ·
panlondon1-2476 avatar image panlondon1-2476 KumarGauravAdminAccount-9183 ·

Have you considered SQL Server, or azure sql if you prefer?, billions of rows of data, no issues, will come back in 3-4 secs. But depends what if you data is structured. Needs structured data, i have no experience with unstructured on sql server.

0 Votes 0 ·