Azure Data Lake Gen2 Disaster Recovery/Storage account failover Capabilities?

Adam Sebetich 1 Reputation point
2020-06-05T16:28:28.463+00:00

I am trying to wrap my brain around Azure Data Lake Gen 2 DR/Storage account failover capabilities and I feel as though I am getting conflicting information. On the Microsoft Documentation - https://learn.microsoft.com/en-us/azure/storage/common/storage-disaster-recovery-guidance#unsupported-features-and-services it mentions that "ADLS Gen2 storage accounts (accounts that have hierarchical namespace enabled) are not supported at this time." However, on the Data lake gen 2 documentation - https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction it mentions that "high availability/disaster recovery capabilities" are possible due to Gen 2 being built on Azure blob storage.

Seems pretty conflicting, anyone have a clear answer? I am trying to ensure I pick the right data lake for my situation and DR/HA is top of mind.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,338 questions
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
2,686 questions
Azure Data Lake Analytics
{count} votes

2 answers

Sort by: Most helpful
  1. Sumarigo-MSFT 43,561 Reputation points Microsoft Employee
    2020-06-09T06:41:22.787+00:00

    @Adam Sebetich Firstly, apologies for the delay in responding here and any inconvenience this issue may have caused.

    ADLS Gen2 is indeed built on top of Azure Blob Storage, but there are certain limitations. And account failover is one of them. There is a centralized place which highlights known issues/limitations - https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-known-issues.

    Specifically, unsupported blob features can be found here - https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-supported-blob-storage-features

    Hope this helps!

    Kindly let us know if the above helps or you need further assistance on this issue.

    ---------------------------------------------------------------------------------------------------------------------------------------------

    Please don’t forget to "Accept the answer" wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.

  2. PRADEEPCHEEKATLA-MSFT 76,921 Reputation points Microsoft Employee
    2020-06-11T11:35:38.53+00:00

    @abhishekbohra-1763 Welcome to the Microsoft Q&A platform.

    For data resiliency with Data Lake Storage Gen2, it is recommended to geo-replicate your data via GRS or RA-GRS that satisfies your HA/DR requirements. Additionally, you should consider ways for the application using Data Lake Storage Gen2 to automatically fail over to the secondary region through monitoring triggers or length of failed attempts, or at least send a notification to admins for manual intervention. Keep in mind that there is tradeoff of failing over versus waiting for a service to come back online.

    9822-adls-gen2-failover.jpg

    For protection against regional outages, configure your account for geo-redundant storage, with or without the option of read access from the secondary region:

    Microsoft recommends RA-GZRS for maximum availability and durability for your applications.

    Geo-redundant storage (GRS) or geo-zone-redundant storage (GZRS) copies your data asynchronously in two geographic regions that are at least hundreds of miles apart. If the primary region suffers an outage, then the secondary region serves as a redundant source for your data. You can initiate a failover to transform the secondary endpoint into the primary endpoint.

    Read-access geo-redundant storage (RA-GRS) or read-access geo-zone-redundant storage (RA-GZRS) provides geo-redundant storage with the additional benefit of read access to the secondary endpoint. If an outage occurs in the primary endpoint, applications configured for read access to the secondary and designed for high availability can continue to read from the secondary endpoint.

    Microsoft automatically failover to the secondary region when problem occurs in primary regions and customers cannot controll the failure from primary region to secondary region.

    Note: In Azure Data Lake Storage Gen2 Customer controlled failover is not supported for your primary and secondary region.

    We are working on customer-controlled failover for both disaster and DR drill scenarios.

    I would suggest you to vote up an idea submitted by another Azure customer.

    Ability to manually initiate failover on Geo-Redundant storage accounts (GRS & RA-GRS)

    All of the feedback you share in these forums will be monitored and reviewed by the Microsoft engineering teams responsible for building Azure.

    Hope this helps. Do let us know if you any further queries.


    Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    1 person found this answer helpful.