Data lake zones and containers, what is the best practices of file name

manish verma 421 Reputation points
2023-10-16T07:58:21.98+00:00

Hi All,

follow up Best practices from cloud Adoption framework, can somebody tell standers for file Name

Data LakeLayersContainer numberContainer nameFile Name Standard1Raw1Landing?1Raw2Conformance?2Enriched1Standardized?2Curated2Data products?3Development1Analytics sandbox?

Azure Data Lake Analytics
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 77,336 Reputation points Microsoft Employee
    2023-10-17T04:54:54.8733333+00:00

    @manish verma - Thanks for the question and using MS Q&A platform.

    When it comes to naming files in a data lake, there are several best practices that you can follow to ensure consistency and ease of use. Here are some general guidelines:

    1. Use descriptive names: Use descriptive names that accurately reflect the contents of the file. This will make it easier for users to find and understand the data.
    2. Use a consistent naming convention: Use a consistent naming convention across all files in the data lake. This will make it easier to organize and search for files.
    3. Avoid special characters: Avoid using special characters in file names, such as spaces, underscores, or hyphens. Instead, use camel case or Pascal case to separate words.
    4. Use version control: If you need to make changes to a file, use version control to keep track of the changes. This will help prevent confusion and ensure that users are always working with the latest version of the file.
    5. Use a hierarchical folder structure: Use a hierarchical folder structure to organize files in the data lake. This will make it easier to find and access files.
    6. Use metadata: Use metadata to provide additional information about the file, such as the date it was created, the author, or the purpose of the file.
    7. Follow any specific naming conventions or standards that are required by your organization or industry.

    According to the Azure Analytics Analytics end-to-end with Azure Synapse documentation:

    Azure Data Lake is used as the home for data throughout the various stages of the data lifecycle. Azure Data Lake is organized by different layers and containers as follows:

    • The Raw layer is the landing area for data coming in from source systems. As the name implies, data in this layer is in raw, unfiltered, and unpurified form.
    • In the next stage of the lifecycle, data moves to the Enriched layer where data is cleaned, filtered, and possibly transformed.
    • Data then moves to the Curated layer, which is where consumer-ready data is maintained.

    In terms of the specific example you provided, it looks like the file names are organized by layer, container, and purpose. This is a common approach and can be effective as long as the naming convention is consistent and easy to understand.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

0 additional answers

Sort by: Most helpful