Migrate Azure Data Lake Storage from Gen1 to Gen2 by using the Azure portal

On Feb. 29, 2024 Azure Data Lake Storage Gen1 will be retired. For more information, see the official announcement. If you use Azure Data Lake Storage Gen1, make sure to migrate to Azure Data Lake Storage Gen2 prior to that date.

This article shows you how to simplify the migration by using the Azure portal. You can provide your consent in the Azure portal and then migrate your data and metadata (such as timestamps and ACLs) automatically from Azure Data Lake Storage Gen1 to Azure Data Lake Storage Gen2. For easier reading, this article uses the term Gen1 to refer to Azure Data Lake Storage Gen1, and the term Gen2 to refer to Azure Data Lake Storage Gen2.

Note

Your account may not qualify for portal-based migration based on certain constraints. When the Migrate data button is not enabled in the Azure portal for your Gen1 account, if you have a support plan, you can file a support request. You can also get answers from community experts in Microsoft Q&A.

Warning

Azure Data Lake Storage Gen2 doesn't support Azure Data Lake Analytics. If you're using Azure Data Lake Analytics, you'll need to migrate before proceeding. See Migrate Azure Data Lake Analytics workloads for more information.

To migrate to Gen2 using the Azure portal, follow these steps:

✔️ Step 1: Assess readiness

✔️ Step 2: Create a storage account with Gen2 capabilities

✔️ Step 3: Migrate data using the Azure portal

✔️ Step 4: Migrate workloads and applications

Before you start, be sure to read the general guidance on how to migrate from Gen1 to Gen2 in Azure Data Lake Storage migration guidelines and patterns.

Create a storage account with Gen2 capabilities

Azure Data Lake Storage Gen2 is not a dedicated storage account or service type. It's a set of capabilities that you can obtain by enabling the Hierarchical namespace feature of an Azure storage account. To create an account that has Gen2 capabilities, see Create a storage account to use with Azure Data Lake Storage Gen2.

As you create the account, make sure to configure settings with the following values.

Setting Value
Storage account name Any name that you want. This name doesn't have to match the name of your Gen1 account and can be in any subscription of your choice.
Location The same region used by the Data Lake Storage Gen1 account
Replication LRS or ZRS
Minimum TLS version 1.0
NFS v3 Disabled
Hierarchical namespace Enabled

Note

The migration tool in the Azure portal doesn't move account settings. Therefore, after you've created the account, you'll have to manually configure settings such as encryption, network firewalls, data protection.

Important

Ensure that you use a fresh, newly created storage account that has no history of use. Don't migrate to a previously used account or use an account in which containers have been deleted to make the account empty.

Verify RBAC role assignments

For Gen2, ensure that the Storage Blob Data Owner role has been assigned to your Azure Active Directory (Azure AD) user identity in the scope of the storage account, parent resource group, or subscription.

For Gen1, ensure that the Owner role has been assigned to your Azure AD identity in the scope of the Gen1 account, parent resource group, or subscription.

Migrate Azure Data Lake Analytics workloads

Azure Data Lake Storage Gen2 doesn't support Azure Data Lake Analytics. Azure Data Lake Analytics will be retired on February 29, 2024. If you attempt to use the Azure portal to migrate an Azure Data Lake Storage Gen1 account that is used for Azure Data Lake Analytics, it's possible that you'll break your Azure Data Lake Analytics workloads. You must first migrate your Azure Data Lake Analytics workloads to Azure Synapse Analytics or another supported compute platform before attempting to migrate your Gen1 account.

For more information, see Manage Azure Data Lake Analytics using the Azure portal.

Prepare the Gen1 account

File or directory names with only spaces or tabs, ending with a ., containing a :, or with multiple consecutive forward slashes (//) are not compatible with Gen2. You'll need to rename these files or directories before you migrate.

Perform the migration

Before you begin, review the two migration options below, and decide whether to only copy data from Gen1 to Gen2 (recommended) or perform a complete migration.

Note

No matter which option you select, a container named gen1 will be created in the Gen2-enabled account, and all data from the Gen1 account will be copied to this new gen1 container. When the migration is complete, in order to find the data on a path that existed on Gen1, you must add the prefix gen1/ to the same path to access it on Gen2. For example, a path that was named 'FolderRoot/FolderChild/FileName.csv' on Gen1 will be available at 'gen1/FolderRoot/FolderChild/FileName.csv' on Gen2. Container names can't be renamed on Gen2, so this gen1 container on Gen2 can't be renamed post migration. However, the data can be copied to a new container in Gen2 if needed.

Choose a migration option

Option 1: Copy data only (recommended). In this option, data will be copied from Gen1 to Gen2. As the data is being copied, the Gen1 account will become read-only. After the data is copied, both the Gen1 and Gen2 accounts will be accessible. However, you must update the applications and compute workloads to use the new ADLS Gen2 endpoint.

Option 2: Perform a complete migration. In this option, data will be copied from Gen1 to Gen2. After the data is copied, all the traffic from the Gen1 account will be redirected to the Gen2-enabled account. Redirected requests will use the Gen1 compatibility layer to translate Gen1 API calls to Gen2 equivalents. During the migration, the Gen1 account will become read-only. After the migration is complete, the Gen1 account won't be accessible.

Whichever option you choose, after you've migrated and verified that all your workloads work as expected, you can delete the Gen1 account.

Option 1: Copy data from Gen1 to Gen2

  1. Sign in to the Azure portal to get started.

  2. Locate your Data Lake Storage Gen1 account and display the account overview.

  3. Select the Migrate data button.

    Button to migrate

  4. Select Copy data to a new Gen2 account.

    Copy data option

  5. Give Microsoft consent to perform the data migration by selecting the checkbox. Then, click the Apply button.

    Checkbox to provide consent

    Important

    While your data is being migrated, your Gen1 account becomes read-only and your Gen2-enabled account is disabled. When the migration is finished, you can read and write to both accounts.

    You can stop the migration at any time by selecting the Stop migration button.

    Stop migration option

Option 2: Perform a complete migration

  1. Sign in to the Azure portal to get started.

  2. Locate your Data Lake Storage Gen1 account and display the account overview.

  3. Select the Migrate data button.

    Migrate button

  4. Select Complete migration to a new Gen2 account.

    Complete migration option

  5. Give Microsoft consent to perform the data migration by selecting the checkbox. Then, click the Apply button.

    Consent checkbox

    Important

    While your data is being migrated, your Gen1 account becomes read-only and the Gen2-enabled account is disabled.

    Also, while the Gen1 URI is being redirected, both accounts are disabled.

    When the migration is finished, your Gen1 account will be disabled. The data in your Gen1 account won't be accessible and will be deleted after 30 days. Your Gen2 account will be available for reads and writes.

    You can stop the migration at any time before the URI is redirected by selecting the Stop migration button.

    Migration stop button

Migrate workloads and applications

  1. Configure services in your workloads to point to your Gen2 endpoint. For links to articles that help you configure Azure Databricks, HDInsight, and other Azure services to use Gen2, see Azure services that support Azure Data Lake Storage Gen2.

  2. Update applications to use Gen2 APIs. See these guides:

    Environment Article
    Azure Storage Explorer Use Azure Storage Explorer to manage directories and files in Azure Data Lake Storage Gen2
    .NET Use .NET to manage directories and files in Azure Data Lake Storage Gen2
    Java Use Java to manage directories and files in Azure Data Lake Storage Gen2
    Python Use Python to manage directories and files in Azure Data Lake Storage Gen2
    JavaScript (Node.js) Use JavaScript SDK in Node.js to manage directories and files in Azure Data Lake Storage Gen2
    REST API Azure Data Lake Store REST API
  3. Update scripts to use Data Lake Storage Gen2 PowerShell cmdlets, and Azure CLI commands.

  4. Search for URI references that contain the string adl:// in code files, or in Databricks notebooks, Apache Hive HQL files or any other file used as part of your workloads. Replace these references with the Gen2 formatted URI of your new storage account. For example: the Gen1 URI: adl://mydatalakestore.azuredatalakestore.net/mydirectory/myfile might become abfss://myfilesystem@mydatalakestore.dfs.core.windows.net/mydirectory/myfile.

Gen1 compatibility layer

This layer attempts to provide application compatibility between Gen1 and Gen2 as a convenience during the migration, so that applications can continue using Gen1 APIs to interact with data in the Gen2-enabled account. This layer has limited functionality and it is strongly advised to validate the workloads with test accounts if you use this approach as part of migration. The compatibility layer runs on the server, so there's nothing to install.

Important

Microsoft does not recommend this capability as a replacement for migrating your workloads and applications. Support for the Gen1 compatibility layer will end when Gen1 is retired on Feb. 29, 2024.

To encounter the least number of issues with the compatibility layer, make sure that your Gen1 SDKs use the following versions (or higher).

Language SDK version
.NET 2.3.9
Java 1.1.21
Python 0.0.51

The following functionality isn't supported in the compatibility layer.

  • ListStatus API option to ListBefore an entry.

  • ListStatus API with over 4,000 files without a continuation token.

  • Chunk-encoding for append operations.

  • Any API calls that use https://management.azure.com/ as the Azure Active Directory (Azure AD) token audience.

  • File or directory names with only spaces or tabs, ending with a ., containing a :, or with multiple consecutive forward slashes (//).

Frequently asked questions

How much does the data migration cost?

There is no cost to use the portal-based migration tool, however you will be billed for usage of Azure Data Lake Gen1 and Gen2 services. During the data migration, you will be billed for the data storage and transactions of the Gen1 account.

Post migration, if you chose the option that copies only data, then you will be billed for the data storage and transactions for both Azure Data Lake Gen1 and Gen2 accounts. To avoid being billed for the Gen1 account, delete the Gen1 account after you've updated your applications to point to Gen2. If you chose to perform a complete migration, you will be billed only for the data storage and transactions of the Gen2-enabled account.

Make sure all your Azure Data lake Analytics accounts are migrated to Azure Synapse Analytics or another supported compute platform. Once Azure Data Lake Analytics accounts are migrated, retry the consent. If you see the issue further and you have a support plan, you can file a support request. You can also get answers from community experts in Microsoft Q&A.

After the migration completes, can I go back to using the Gen1 account?

If you used Option 1: Copy data from Gen1 to Gen2 mentioned above, then both the Gen1 and Gen2 accounts are available for reads and writes post migration. However, if you used Option 2: Perform a complete migration, then going back to the Gen1 account isn't supported. In Option 2, after the migration completes, the data in your Gen1 account won't be accessible and will be deleted after 30 days. You can continue to view the Gen1 account in the Azure portal, and when you're ready, you can delete the Gen1 account.

I would like to enable Geo-redundant storage (GRS) on the Gen2-enabled account, how do I do that?

Once the migration is complete, both in "Copy data" and "Complete migration" options, you can go ahead and change the redundancy option to GRS as long as you don't plan to use the application compatibility layer. The application compatibility will not work on accounts that use GRS redundancy.

Gen1 doesn't have containers and Gen2 has them - what should I expect?

When we copy the data over to your Gen2-enabled account, we automatically create a container named 'Gen1'. In Gen2 container names cannot be renamed and hence post migration data can be copied to new container in Gen2 as needed.

What should I consider in terms of migration performance?

When you copy the data over to your Gen2-enabled account, two factors that can affect performance are the number of files and the amount of metadata you have. For example, many small files can affect the performance of the migration.

Will WebHDFS File System API's supported on Gen2 account post migraiton?

WebHDFS File System APIs of Gen1 will be supported on Gen2 but with certain deviations, and only limited functionality is supported via the compatibilty layer. Customers should plan to levarage ADLS Gen2-specific APIs for better performance and features.

Next steps