What's new in HDInsight on AKS? (Preview)
Note
We will retire Azure HDInsight on AKS on January 31, 2025. Before January 31, 2025, you will need to migrate your workloads to Microsoft Fabric or an equivalent Azure product to avoid abrupt termination of your workloads. The remaining clusters on your subscription will be stopped and removed from the host.
Only basic support will be available until the retirement date.
Important
This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include more legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability. For information about this specific preview, see Azure HDInsight on AKS preview information. For questions or feature suggestions, please submit a request on AskHDInsight with the details and follow us for more updates on Azure HDInsight Community.
In HDInsight on AKS, all cluster management and operations have native support for service management on Azure portal for individual clusters.
In HDInsight on AKS, two new concepts are introduced:
- Cluster Pools are used to group and manage clusters.
- Clusters are used for open source computes, they're hosted within a cluster pool.
Cluster Pools
HDInsight on AKS runs on Azure Kubernetes Service (AKS). The top-level resource is the Cluster Pool and manages all clusters running on the same AKS cluster. When you create a Cluster Pool, an underlying AKS cluster is created at the same time to host all clusters in the pool. Cluster pools are a logical grouping of clusters, which helps in building robust interoperability across multiple cluster types and allow enterprises to have the clusters in the same virtual network. Cluster pools provide rapid and cost-effective access to all the cluster types created on-demand and at scale.One cluster pool corresponds to one cluster in AKS infrastructure.
Clusters
Clusters are individual open source compute workloads, such as Apache Spark, Apache Flink, and Trino, which can be created rapidly in few minutes with preset configurations and few clicks. Though running on the same cluster pool, each cluster can have its own configurations, such as cluster type, version, node VM size, node count. Clusters are running on separated compute resources with its own DNS and endpoints.
Features currently in preview
The following table list shows the features of HDInsight on AKS that are currently in preview. Preview features are sorted alphabetically.
Area | Features |
---|---|
Fundamentals | Create Pool and clusters using portal, Web secure shell (ssh) support, Ability to Choose number of worker nodes during cluster creation |
Storage | ADLS Gen2 Storage support |
Metastore | External Metastore support for Trino, Spark and Flink, Integrate with HDInsight |
Security | Support for ARM RBAC, Support for MSI based authentication, Option to provide cluster access to other users |
Logging and Monitoring | Log aggregation in Azure log analytics, for server logs, Cluster and Service metrics via Managed Prometheus and Grafana, Support server metrics in Azure monitor, Service Status page for monitoring the Service health |
Auto Scale | Load based Auto Scale, and Schedule based Auto Scale |
Customize and Configure Clusters | Support for script actions during cluster creation, Support for library management, Service configuration settings after cluster creation |
Trino | Support for Trino catalogs, Trino CLI Support, DBeaver support for query submission, Add or remove plugins and connectors, Support for logging query events, Support for scan query statistics for any Connector in Trino dashboard, Support for Trino dashboard to monitor queries, Query Caching, Integration with Power BI, Integration with Apache Superset, Redash, Support for multiple connectors |
Flink | Support for Flink native web UI, Flink support with HMS for DStream, Submit jobs to the cluster using REST API and Azure portal, Run programs packaged as JAR files via the Flink CLI, Support for persistent Savepoints, Support for update the configuration options when the job is running, Connecting to multiple Azure services: Azure Cosmos DB, Azure Databricks, Azure Data Explorer, Azure Event Hubs, Azure IoT Hub, Azure Pipelines, Azure Data Factory Workflow Orchestration Manager, HDInsight Kafka, Submit jobs to the cluster using Flink CLI and CDC with Flink |
Spark | Jupyter Notebook, Support for Delta lake 2.0, Zeppelin Support, Support ATS, Support for Yarn History server interface, Job submission using SSH, Job submission using SDK and Machine Learning Notebook |
Roadmap of Features
Feature | Estimated release timeline | Status |
---|---|---|
Autoscale - Load Based - Trino | Q1 2024 | Completed |
Shuffle aware load based auto scale for Spark | Q2 2024 | In Progress |
In Place Upgrade | Q2 2024 | Completed |
Reserved Instance Support | Q2 2024 | In Progress |
MSI based authentication for Metastore (SQL) | Q1 2024 | In Progress |
Spark 3.4 | Q2 2024 | In Progress |
Trino 426 | Q1 2024 | Completed |
Ranger for RBAC | Q2 2024 | In Progress |
App mode support for Flink | Q1 2024 | Completed |
Flink 1.17 | Q1 2024 | Completed |
Spark ACID Support | Q1 2024 | In Progress |
Configurable SKUs for Headnode, SSH | Q2 2024 | In Progress |
Flink SQL Gateway Support | Q1 2024 | Completed |
Private Clusters for HDInsight on AKS | Q1 2024 | Completed |
Ranger Support for Spark SQL | Q4 2024 | In Progress |
Ranger ACLs on Storage Layer | Q4 2024 | In Progress |
Support for One lake as primary container | Q2 2024 | In Progress |