Create a cluster with Data Lake Storage Gen2 using the Azure portal

The Azure portal is a web-based management tool for services and resources hosted in the Microsoft Azure cloud. In this article, you learn how to create Linux-based Azure HDInsight clusters by using the portal. Additional details are available from Create HDInsight clusters.

Warning

Billing for HDInsight clusters is prorated per minute, whether you use them or not. Be sure to delete your cluster after you finish using it. See how to delete an HDInsight cluster.

If you don't have an Azure subscription, create a free account before you begin.

To create an HDInsight cluster that uses Data Lake Storage Gen2 for storage, follow these steps to configure a storage account that has a hierarchical namespace.

Create a user-assigned managed identity

Create a user-assigned managed identity, if you don’t already have one.

  1. Sign in to the Azure portal.
  2. In the upper-left click Create a resource.
  3. In the search box, type user assigned and click User Assigned Managed Identity.
  4. Click Create.
  5. Enter a name for your managed identity, select the correct subscription, resource group, and location.
  6. Click Create.

For more information on how managed identities work in Azure HDInsight, see Managed identities in Azure HDInsight.

Create a user-assigned managed identity.

Create a storage account to use with Data Lake Storage Gen2

Create a storage account to use with Azure Data Lake Storage Gen2.

  1. Sign in to the Azure portal.
  2. In the upper-left click Create a resource.
  3. In the search box, type storage and click storage account.
  4. Click Create.
  5. On the Create storage account screen:
    1. Select the correct subscription and resource group.
    2. Enter a name for your storage account with Data Lake Storage Gen2.
    3. Click on the Advanced tab.
    4. Click Enabled next to Hierarchical namespace under Data Lake Storage Gen2.
    5. Click Review + create.
    6. Click Create

For more information on other options during storage account creation, see Quickstart: Create a storage account for Azure Data Lake Storage Gen2.

Screenshot showing storage account creation in the Azure portal.

Set up permissions for the managed identity on the Data Lake Storage Gen2

Assign the managed identity to the Storage Blob Data Owner role on the storage account.

  1. In the Azure portal, go to your storage account.

  2. Select Access control (IAM).

  3. Select Add > Add role assignment.

    Screenshot showing Access control (IAM) page with Add role assignment menu open.

  4. On the Role tab, select Storage Blob Data Owner.

    Screenshot showing Add role assignment page with Role tab selected.

  5. On the Members tab, select Managed identity, and then select Select members.

  6. Select your subscription, select User-assigned managed identity, and then select your user-assigned managed identity.

  7. On the Review + assign tab, select Review + assign to assign the role.

    The user-assigned identity that you selected is now listed under the selected role.

    For more information about role assignments, see Assign Azure roles using the Azure portal

  8. After this initial setup is complete, you can create a cluster through the portal. The cluster must be in the same Azure region as the storage account. In the Storage tab of the cluster creation menu, select the following options:

    • For Primary storage type, select Azure Data Lake Storage Gen2.

    • Under Primary Storage account, search for and select the newly created storage account with Data Lake Storage Gen2 storage.

    • Under Identity, select the newly created user-assigned managed identity.

      Storage settings for using Data Lake Storage Gen2 with Azure HDInsight.

    Note

    • To add a secondary storage account with Data Lake Storage Gen2, at the storage account level, simply assign the managed identity created earlier to the new Data Lake Storage Gen2 that you want to add. Please be advised that adding a secondary storage account with Data Lake Storage Gen2 via the "Additional storage accounts" blade on HDInsight isn't supported.
    • You can enable RA-GRS or RA-ZRS on the Azure Blob storage account that HDInsight uses. However, creating a cluster against the RA-GRS or RA-ZRS secondary endpoint isn't supported.
    • HDInsight does not support setting Data Lake Storage Gen2 as read-access geo-zone-redundant storage (RA-GZRS) or geo-zone-redundant storage (GZRS).

Delete the cluster

See Delete an HDInsight cluster using your browser, PowerShell, or the Azure CLI.

Troubleshoot

If you run into issues with creating HDInsight clusters, see access control requirements.

Next steps

You've successfully created an HDInsight cluster. Now learn how to work with your cluster.

Apache Spark clusters

Apache Hadoop clusters

Apache HBase clusters