Azure Data Lake Storage - Non-Production environment

Gopinath Rajee 656 Reputation points
2022-05-16T02:21:18.033+00:00

All,

We use Azure Data Lake Storage GenV2 Hierarchical Namespace Storage Accounts for all our Production needs. We plan to enable GRS on these Storage Accounts. But GRS is LRS(Local) + LRS (Remote).

What if the datacenter at LRS (Local) shuts down for whatever reason (Flooding, Earth Quake ... etc) without any other Regional Failure (which would otherwise warrant a Failover to the Remote Location?) Do I have to now rely on the LRS (Remote) to recreate the Storage Accounts in LRS (Local)? Would in this case, GZRS be a better option?

Thanks,
grajee

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,562 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Luke Murray 11,436 Reputation points MVP Volunteer Moderator
    2022-05-16T08:20:22.25+00:00

    ZRS offers you the best redundancy locally, without failing to a different region.

    By default, there are three copies of the data (LRS), which are stored in one of the Azure datacenters (there are usually three that make up a region - more information on what regions have zones are here: https://azure.microsoft.com/en-us/global-infrastructure/geographies/?WT.mc_id=AZ-MVP-5004796#geographies & https://learn.microsoft.com/en-us/azure/availability-zones/az-overview?WT.mc_id=AZ-MVP-5004796 ).

    "LRS is the lowest-cost redundancy option and offers the least durability compared to other options. LRS protects your data against server rack and drive failures. However, if a disaster such as fire or flooding occurs within the data center, all replicas of a storage account using LRS may be lost or unrecoverable."

    ZRS spreads those three copies to 3 different data centres - which are physically separate.

    GRS is then ZRS and then replicated to a single datacenter in the secondary region (LRS).

    Its a conversation around risk and type of data, most commonly LRS makes sense for Dev/Test workloads, and ZRS/GRS for Production for additional resiliency (or at least ZRS and GRS for your Backups), but if your dev/test workloads may also be produced for some users, then ZRS makes sense.

    Another comment I will make is, to consider all parts of your architecture, its no use having some parts GRS and other parts LRS or ZRS, as the application may not function or failover, you might be spending more than you need to or need to build more redundancy into your application.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.