Welcome to the Microsoft Q&A (Preview) platform.
Happy to answer your query.
You may checkout “FAQs about organizing a Data Lake”, which addressing your query.
If I need a separate dev, test, prod environment, how would this usually be handled?
Usually separate environments are handled with separate services. For instance, in Azure, that would be 3 separate Azure Data Lake Storage resources (which might be in the same subscription or different subscriptions).
We wouldn’t usually separate out dev/test/prod with a folder structure in the same data lake. It can be done (just like you could use the same database with a different schema for dev/test/prod) but it’s not the typical recommended way of handling the separation. We prefer having the exact same folder structure across all 3 environments. If you must get by with it being within one data lake (one service), then the environment should be the top-level node.
Regarding monitoring in ADLS Gen2:
Azure Data Lake Storage Gen2 provides metrics in the Azure portal under the Data Lake Storage Gen2 account and in Azure Monitor. Availability of Data Lake Storage Gen2 is displayed in the Azure portal. To get the most up-to-date availability of a Data Lake Storage Gen2 account, you must run your own synthetic tests to validate availability. Other metrics such as total storage utilization, read/write requests, and ingress/egress are available to be leveraged by monitoring applications and can also trigger alerts when thresholds (for example, Average latency or # of errors per minute) are exceeded.
For more details, refer “Best practices for using Azure Data Lake Storage Gen2”.
Hope this helps. Do let us know if you have any further queries.