Resource prerequisites

Note

We will retire Azure HDInsight on AKS on January 31, 2025. Before January 31, 2025, you will need to migrate your workloads to Microsoft Fabric or an equivalent Azure product to avoid abrupt termination of your workloads. The remaining clusters on your subscription will be stopped and removed from the host.

Only basic support will be available until the retirement date.

Important

This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include more legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability. For information about this specific preview, see Azure HDInsight on AKS preview information. For questions or feature suggestions, please submit a request on AskHDInsight with the details and follow us for more updates on Azure HDInsight Community.

This article details the resources required for getting started with HDInsight on AKS. It covers the necessary and the optional resources and how to create them.

Necessary resources

The following table depicts the necessary resources that are required for cluster creation based on the cluster types.

Workload Managed Service Identity (MSI) Storage SQL Server - SQL Database Key Vault
Trino
Flink
Spark
Trino, Flink, or Spark with Hive Metastore (HMS)

Note

MSI is used as a security standard for authentication and authorization across resources, except SQL Database. The role assignment occurs prior to deployment to authorize MSI to storage and the secrets are stored in Key vault for SQL Database. Storage support is with ADLS Gen2, and is used as data store for the compute engines, and SQL Database is used for table management on Hive Metastore.

Optional resources

Note

  • VNet requires subnet without any existing route table associated with it.
  • HDInsight on AKS allows you to bring your own VNet and Subnet, enabling you to customize your network requirements to suit the needs of your enterprise.
  • Log Analytics workspace is optional and needs to be created ahead in case you would like to use Azure Monitor capabilities like Azure Log Analytics.

You can create the necessary resources in two ways:

Using ARM templates

The following ARM templates allow you to create the specified necessary resources, in one click using a resource prefix and more details as required.

For example, if you provide resource prefix as “demo” then, following resources are created in your resource group depending on the template you select -

  • MSI is created with name as demoMSI.
  • Storage is created with name as demostore along with a container as democontainer.
  • Key vault is created with name as demoKeyVault along with the secret provided as parameter in the template.
  • Azure SQL database is created with name as demoSqlDB along with SQL server with name as demoSqlServer.
Workload Prerequisites
Trino Create the resources mentioned as follows:
1. Managed Service Identity (MSI): user-assigned managed identity.

Deploy Trino to Azure
Flink Create the resources mentioned as follows:
1. Managed Service Identity (MSI): user-assigned managed identity.
2. ADLS Gen2 storage account and a container.

Role assignments:
1. Assigns “Storage Blob Data Owner” role to user-assigned MSI on storage account.

Deploy Apache Flink to Azure
Spark Create the resources mentioned as follows:
1. Managed Service Identity (MSI): user-assigned managed identity.
2. ADLS Gen2 storage account and a container.

Role assignments:
1. Assigns “Storage Blob Data Owner” role to user-assigned MSI on storage account.

Deploy Spark to Azure
Trino, Flink, or Spark with Hive Metastore (HMS) Create the resources mentioned as follows:
1. Managed Service Identity (MSI): user-assigned managed identity.
2. ADLS Gen2 storage account and a container.
3. Azure SQL Server and SQL Database.
4. Azure Key Vault and a secret to store SQL Server admin credentials.

Role assignments:
1. Assigns “Storage Blob Data Owner” role to user-assigned MSI on storage account.
2. Assigns “Key Vault Secrets User” role to user-assigned MSI on Key Vault.

Deploy Trino HMS to Azure

Note

Using these ARM templates require a user to have permission to create new resources and assign roles to the resources in the subscription.

Using Azure portal

Create user-assigned managed identity (MSI)

A managed identity is an identity registered in Microsoft Entra ID (Microsoft Entra ID) whose credentials managed by Azure. With managed identities, you need not to register service principals in Microsoft Entra ID to maintain credentials such as certificates.

HDInsight on AKS relies on user-assigned MSI for communication among different components.

Create storage account – ADLS Gen 2

The storage account is used as the default location for cluster logs and other outputs. Enable hierarchical namespace during the storage account creation to use as ADLS Gen2 storage.

  1. Assign a role: Assign “Storage Blob Data Owner” role to the user-assigned MSI created to this storage account.

  2. Create a container: After creating the storage account, create a container in the storage account.

Note

Option to create a container during cluster creation is also available.

Create Azure SQL Database

Create an Azure SQL Database to be used as an external metastore during cluster creation or you can use an existing SQL Database. However, ensure the following properties are set.

Necessary properties to be enabled for SQL Server and SQL Database-

Resource Type Property Description
SQL Server Authentication method While creating a SQL Server, use "Authentication method" as
Screenshot showing how to select authentication method.
SQL Database Allow Azure services and resources to access this server Enable this property under Networking blade in your SQL database in the Azure portal.

Note

  • Currently, we support only Azure SQL Database as inbuilt metastore.
  • Due to Hive limitation, "-" (hyphen) character in metastore database name is not supported.
  • Azure SQL Database should be in the same region as your cluster.
  • Option to create a SQL Database during cluster creation is also available. However, you need to refresh the cluster creation page to get the newly created database appear in the dropdown list.

Create Azure Key Vault

Key Vault allows you to store the SQL Server admin password set during SQL Database creation. HDInsight on AKS platform doesn’t deal with the credential directly. Hence, it's necessary to store your important credentials in the Key Vault.

  1. Assign a role: Assign “Key Vault Secrets User” role to the user-assigned MSI created as part of necessary resources to this Key Vault.

  2. Create a secret: This step allows you to keep your SQL Server admin password as a secret in Azure Key Vault. Add your password in the “Value” field while creating a secret.

Note

  • Make sure to note the secret name, as this is required during cluster creation.
  • You need to have a “Key Vault Administrator” role assigned to your identity or account to add a secret in the Key Vault using Azure portal. Navigate to the Key Vault and follow the steps on how to assign the role.