Upgrade Azure Blob Storage with Azure Data Lake Storage Gen2 capabilities
Article
This article helps you to enable a hierarchical namespace and unlock capabilities such as file and directory-level security and faster operations. These capabilities are widely used by big data analytics workloads and are referred to collectively as Azure Data Lake Storage Gen2.
An upgrade is one-way. There's no way to revert your account once you've performed the upgrade. We recommend that you validate your upgrade in a nonproduction environment.
Prepare to upgrade
Review feature support
You're account might be configured to use features that aren't yet supported in Data Lake Storage Gen2 enabled accounts. If your account is using a feature that isn't yet supported, the upgrade will not pass the validation step. Review the Blob Storage feature support in Azure Storage accounts article to identify unsupported features. If you're using any of those unsupported features in your account, make sure to disable them before you begin the upgrade.
Note
Blob soft delete is not yet supported by the upgrade process. Make sure to disable blob soft delete and then allow all soft-delete blobs to expire before you upgrade the account.
Ensure that the segments of each blob path are named
The migration process creates a directory for each path segment of a blob. Data Lake Storage Gen2 directories must have a name so for migration to succeed, each path segment in a virtual directory must have a name. The same requirement is true for segments that are named only with a space character. If any path segments are either unnamed (//) or named only with a space character (_), then before you proceed with the migration, you must copy those blobs to a new path that is compatible with these naming requirements.
Locate your storage account and display the account overview.
Select Data Lake Gen2 migration.
The Upgrade to a Storage account with Azure Data Lake Gen2 capabilities configuration page appears.
Expand the Step 1: Review account changes before upgrading section and click Review and agree to changes.
In the Review account changes page, select the checkbox and then click Agree to changes.
Expand the Step 2: Validate account before upgrading section and then click Start validation.
If validation fails, an error appears in the page. In some cases, a View errors link appears. If that link appears, select it.
Then, from the context menu of the error.json file, select Download.
Open the downloaded file to determine why the account did not pass the validation step. The following JSON indicates that an incompatible feature is enabled on the account. In this case, you would disable the feature and then start the validation process again.
After your account has been successfully validated, expand the Step 3: Upgrade account section, and then click Start upgrade.
Important
Write operations are disabled while your account is being upgraded. Read operations aren't disabled, but we strongly recommend that you suspend read operations as they might destabilize the upgrade process.
When the migration has completed successfully, a message similar to the following appears.
Replace the <resource-group-name> placeholder value with the name of your resource group.
Replace the <storage-account-name> placeholder value with the name of your storage account.
Depending on the size of your account, this process can take some time. You can use the asJob switch to run the command in a background job so that your client isn't blocked. The command runs remotely, but the job exists on your local machine or the VM from which you run the command. The results are transmitted to your local machine or the VM.
To check the status of the job, and display all of the properties of the job in a list, pipe the return variable to the Format-List cmdlet.
$result | Format-List -Property *
If the validation succeeds, the State property will be set to Completed.
If validation fails, the State property will be set to Failed, and the Error property will show validation errors.
The following output indicates that an incompatible feature is enabled on the account. In this case, you would disable the feature and then start the validation process again.
In some cases, the Error property provides you with a path to a file named error.json. You can open that file to determine why the account did not pass the validation step.
The following JSON indicates that an incompatible feature is enabled on the account. In this case, you would disable the feature and then start the validation process again.
Like the validation example above, this example uses the asJob switch to run the command in a background job. The Force switch overrides prompts to confirm the upgrade. If you don't use the AsJob switch, you don't have to use the Force switch because you can just respond to the prompts.
Important
Write operations are disabled while your account is being upgraded. Read operations aren't disabled, but we strongly recommend that you suspend read operations as they might destabilize the upgrade process.
To check the status of the job, use the same techniques as described in the previous steps. As the process runs, the State property will be set to Running.
When the migration has completed successfully, the State property will be set to Completed and the Error property will not show any errors.
Important
A rough estimate of the upgrade time would be approximately 5-10 minutes per 2 million blobs. For example, if the account has 10 million blobs, then the upgrade will take approximately 25-50 minutes. Accounts that contain fewer than 2 million blobs typically upgrade in less than 10 minutes.
First, open the Azure Cloud Shell, or if you've installed the Azure CLI locally, open a command console application such as Windows PowerShell.
Verify that the version of Azure CLI that have installed is 2.29.0 or higher by using the following command.
az --version
If your version of Azure CLI is lower than 2.29.0, then install the latest version. For more information, see Install the Azure CLI.
If your identity is associated with more than one subscription, then set your active subscription.
az account set --subscription <subscription-id>
Replace the <subscription-id> placeholder value with the ID of your subscription.
Validate your storage account by using the following command.
Replace the <resource-group-name> placeholder value with the name of your resource group.
Replace the <storage-account-name> placeholder value with the name of your storage account.
If the validation succeeds, the process completes and no errors appear.
If validation fails, a validation error will appear in the console. For example, the error (IncompatibleValuesForAccountProperties) Values for account properties are incompatible: Versioning Enabled indicates that an incompatible feature (Versioning) is enabled on the account. In this case, you would disable the feature and then start the validation process again.
In some cases, the path to a file named error.json appears in the console. You can open that file to determine why the account did not pass the validation step.
The following JSON indicates that an incompatible feature is enabled on the account. In this case, you would disable the feature and then start the validation process again.
Write operations are disabled while your account is being upgraded. Read operations aren't disabled, but we strongly recommend that you suspend read operations as they might destabilize the upgrade process.
If the migration succeeds, the process completes and no errors appear.
To stop the upgrade before it completes, use the az storage account hns-migration stop command.
az storage account hns-migration stop -n <storage-account-name> -g <resource-group-name>
Migrate data, workloads, and applications
Configure services in your workloads to point to either the Blob service endpoint or the Data Lake storage endpoint.
For Hadoop workloads that use Windows Azure Storage Blob driver or WASB driver, make sure to modify them to use the Azure Blob File System (ABFS) driver. Unlike the WASB driver that makes requests to the Blob service endpoint, the ABFS driver will make requests to the Data Lake Storage endpoint of your account.
Test custom applications to ensure that they work as expected with your upgraded account.
Multi-protocol access on Data Lake Storage enables most applications to continue using Blob APIs without modification. If you encounter issues or you want to use APIs to work with directory operations and ACLs, consider moving some of your code to use Data Lake Storage Gen2 APIs. See guides for .NET, Java, Python, Node.js, and REST.
Test any custom scripts to ensure that they work as expected with your upgraded account.
As is the case with Blob APIs, many of your scripts will likely work without requiring you to modify them. However, if needed, you can upgrade script files to use Data Lake Storage Gen2 PowerShell cmdlets, and Azure CLI commands.