Share via

Data Lake Organization Pattern

grajee 371 Reputation points
2021-06-02T19:19:23.757+00:00

What are the ways in which a data lake can be organized? This is in reference to the Zones and access.

First Option:
RawZone (Container)\Retail\Full
RawZone (Container)\Retail\Incr

StageZone (Container)\Retail\Full
StageZone (Container)\Retail\Incr
CuratedZone (Container)\Retail\Global

RawZone (Container)\Wholesale\Full
RawZone (Container)\ Wholesale\Incr
RawZone (Container)\ Wholesale\ITD

StageZone (Container)\Wholesale\Full
StageZone (Container)\Wholesale\Incr

CuratedZone (Container)\WholeSale\Global

Second Option:
Retail(Container)\RawZone\Full
Retail(Container)\RawZone\Incr
Retail(Container)\RawZone\ITD
Retail(Container)\StageZone\Full
Retail(Container)\GoldZone\Global

Wholesale(Container)\RawZone\Full
Wholesale(Container)\RawZone\Incr
Wholesale(Container)\RawZone\ITD
Wholesale(Container)\GoldZone\Global

SubjectAreas are– Retail, Wholesale, Online, Lease … Rentals.

In the first option, data files will be organized Starting with the Zones
In the second option, data files will be organized Starting with the subject area

The second option is very specific to a SubjectArea and the access rights can be easily assigned to specific groups but the issue is there are many RawZone as many subject areas (same applies to other zones as well). In the first approach, applying RBAC at the Zone level results in the rights getting inherited to the folders resulting in users not belonging to Retail getting access to WholeSale which means we have to use ACLs (?) to remove the unwanted permissions.

Thanks,
grajee

Azure Data Lake Storage
Azure Data Lake Storage

An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.


1 answer

Sort by: Most helpful
  1. HimanshuSinha 19,637 Reputation points Microsoft Employee Moderator
    2021-06-04T20:24:01.433+00:00

    Hello @grajee ,
    Thanks for the ask and using the Microsoft Q&A platform .
    Let me acknowledge that your ask was very descriptive , and I understand the ask here is to use RBAC or ACL ? Since you are looking for granular permission I could suggest you to go with the ACL . After looking at the folder structure it appears that you will have to manually set the ACL which can be a challenge for so many folders .Not sure if you have already gone through the doc here , sharing it just in case .
    .
    Thanks
    Himanshu
    Please do consider clicking on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.