Share via


What is Azure HDInsight?

Fully managed Big Data Open Source Analytics Service with popular open source frameworks such as Kafka, Storm, R, Spark, Hive, HBase, Phoenix, LLAP, Sqoop, Oozie & Hadoop.

100% Apache Open Source with No lock in. Customers can freely move between on premise, Azure and other clouds as Microsoft does not use any proprietary code with HDInsight.

Manageability & Operations:
• Fully managed service with 99.9% availability SLAs [Industry’s best SLAs]
• Highly optimized, best performance with the default configuration [No fine tuning required for the great performance]
• Customers have full control on cluster. We support wide range of customizations
• Cluster scaling via Scale API
• Microsoft monitors underlying cluster infrastructure as well as various open source services running on cluster for issues and automatically fixes issues with its advanced healing infrastructure
• Integrated with Azure Log Analytics for Log Management & Integrated dashboards
• High availability configurations such as multiple head nodes [if master goes down, your jobs keep running].
• Proven performance at large scale https://azure.microsoft.com/en-us/blog/hdinsight-interactive-query-performance-benchmarks-and-integration-with-power-bi-direct-query/

Enterprise Security:
• VNET and Network Access Control support for perimeter security
• Option for customers to bring their own firewalls
• Active Directory support for multi user configuration
• Role based isolation, access control for Table, Column & row level data via Apache Ranger
• Auditing of all access attempts
• Support for multiple Open Source Frameworks with Ranger such as Hive, Spark, LLAP
• Encryption at REST and in transit
• Most comprehensive compliance
• Available is Government Cloud

Data Storage:
• Support Azure Data Lake Store, Azure Hot Storage, Azure Cool Storage for data storage
• Shared managed services such as Hive Metastore, Ranger Database, Oozie Database & Sqoop Database

Tooling
• 100 % Open Source data science tools s Zeppelin, Jupyter and R Studio
• Best in class Spark debugging support in IntelliJ
• 1st class support for Eclipse, Visual Studio and Visual Studio Code, Power BI & Apache DBeaver
• Native HBase and Phoenix REST SDKs
• Tez View, Grafana and Hive View for monitoring and debugging hive queries
• Cluster orchestration with PowerShell, Azure SDK, ARM templates or Azure Data Factory
• Rich curated marketplace with one-click deploy experience of most popular big data applications
Cost
• HDInsight is most cost-effective solution in its category
• Per Minute billing & No additional support services are required for Open Source components

Use cases
• ETL/Batch [MR, Pig, Hive, Spark]
• Interactive Exploration [Hive, LLAP, Spark SQL]
• Data Science & Machine Learning [R, Spark ML]
• Streaming [Kafka-->Storm/Spark Streaming -->HBase]
• Lift & Shift [HDP & Cloudera Migrations to Azure]