What's new in HDInsight on AKS? (Preview)

Artikkeli
09/20/2024

Note

We will retire Azure HDInsight on AKS on January 31, 2025. Before January 31, 2025, you will need to migrate your workloads to Microsoft Fabric or an equivalent Azure product to avoid abrupt termination of your workloads. The remaining clusters on your subscription will be stopped and removed from the host.

Only basic support will be available until the retirement date.

Important

This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include more legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability. For information about this specific preview, see Azure HDInsight on AKS preview information. For questions or feature suggestions, please submit a request on AskHDInsight with the details and follow us for more updates on Azure HDInsight Community.

In HDInsight on AKS, all cluster management and operations have native support for service management on Azure portal for individual clusters.

In HDInsight on AKS, two new concepts are introduced:

Cluster Pools are used to group and manage clusters.
Clusters are used for open source computes, they're hosted within a cluster pool.

Cluster Pools

HDInsight on AKS runs on Azure Kubernetes Service (AKS). The top-level resource is the Cluster Pool and manages all clusters running on the same AKS cluster. When you create a Cluster Pool, an underlying AKS cluster is created at the same time to host all clusters in the pool. Cluster pools are a logical grouping of clusters, which helps in building robust interoperability across multiple cluster types and allow enterprises to have the clusters in the same virtual network. Cluster pools provide rapid and cost-effective access to all the cluster types created on-demand and at scale.One cluster pool corresponds to one cluster in AKS infrastructure.

Clusters

Clusters are individual open source compute workloads, such as Apache Spark, Apache Flink, and Trino, which can be created rapidly in few minutes with preset configurations and few clicks. Though running on the same cluster pool, each cluster can have its own configurations, such as cluster type, version, node VM size, node count. Clusters are running on separated compute resources with its own DNS and endpoints.

Features currently in preview

The following table list shows the features of HDInsight on AKS that are currently in preview. Preview features are sorted alphabetically.

Area	Features
Fundamentals	Create Pool and clusters using portal, Web secure shell (ssh) support, Ability to Choose number of worker nodes during cluster creation
Storage	ADLS Gen2 Storage support
Metastore	External Metastore support for Trino, Spark and Flink, Integrate with HDInsight
Security	Support for ARM RBAC, Support for MSI based authentication, Option to provide cluster access to other users
Logging and Monitoring	Log aggregation in Azure log analytics, for server logs, Cluster and Service metrics via Managed Prometheus and Grafana, Support server metrics in Azure monitor, Service Status page for monitoring the Service health
Auto Scale	Load based Auto Scale, and Schedule based Auto Scale
Customize and Configure Clusters	Support for script actions during cluster creation, Support for library management, Service configuration settings after cluster creation
Trino	Support for Trino catalogs, Trino CLI Support, DBeaver support for query submission, Add or remove plugins and connectors, Support for logging query events, Support for scan query statistics for any Connector in Trino dashboard, Support for Trino dashboard to monitor queries, Query Caching, Integration with Power BI, Integration with Apache Superset, Redash, Support for multiple connectors
Flink	Support for Flink native web UI, Flink support with HMS for DStream, Submit jobs to the cluster using REST API and Azure portal, Run programs packaged as JAR files via the Flink CLI, Support for persistent Savepoints, Support for update the configuration options when the job is running, Connecting to multiple Azure services: Azure Cosmos DB, Azure Databricks, Azure Data Explorer, Azure Event Hubs, Azure IoT Hub, Azure Pipelines, Azure Data Factory Workflow Orchestration Manager, HDInsight Kafka, Submit jobs to the cluster using Flink CLI and CDC with Flink
Spark	Jupyter Notebook, Support for Delta lake 2.0, Zeppelin Support, Support ATS, Support for Yarn History server interface, Job submission using SSH, Job submission using SDK and Machine Learning Notebook

Roadmap of Features

Feature	Estimated release timeline	Status
Autoscale - Load Based - Trino	Q1 2024	Completed
Shuffle aware load based auto scale for Spark	Q2 2024	In Progress
In Place Upgrade	Q2 2024	Completed
Reserved Instance Support	Q2 2024	In Progress
MSI based authentication for Metastore (SQL)	Q1 2024	In Progress
Spark 3.4	Q2 2024	In Progress
Trino 426	Q1 2024	Completed
Ranger for RBAC	Q2 2024	In Progress
App mode support for Flink	Q1 2024	Completed
Flink 1.17	Q1 2024	Completed
Spark ACID Support	Q1 2024	In Progress
Configurable SKUs for Headnode, SSH	Q2 2024	In Progress
Flink SQL Gateway Support	Q1 2024	Completed
Private Clusters for HDInsight on AKS	Q1 2024	Completed
Ranger Support for Spark SQL	Q4 2024	In Progress
Ranger ACLs on Storage Layer	Q4 2024	In Progress
Support for One lake as primary container	Q2 2024	In Progress

Jaa

What's new in HDInsight on AKS? (Preview)

Cluster Pools

Clusters

Features currently in preview

Roadmap of Features

Palaute

Lisäresursseja