Best Practices to give file name that is process by Azure Databricks

manish verma 516 Reputation points
2023-10-12T04:58:55.2333333+00:00

We are looking for Best Practices to give file name that is process by Azure Databricks

 

We have seen many suggestions to give filenames.

 

i.e filename_yyyy_MM_DD

i.e filename_extension.csv

i.e filename_Fullload_YYYY_MM_DD

i.e filename_ Incremental_YYYY_MM_DD

 

request from Microsoft expert, please suggest filename standard for

 

Inbound folder-

Process Folder-

Curated Folder-

 

 

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,559 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA 90,641 Reputation points Moderator
    2023-10-12T07:16:17.76+00:00

    @manish verma - Thanks for the question and using MS Q&A platform.

    When it comes to naming files that are processed by Azure Databricks, there are several best practices that you can follow to ensure consistency and clarity. Here are some suggestions for naming files in the inbound, process, and curated folders:

    1. Inbound folder: For files that are being uploaded to the inbound folder, it is a good practice to include a timestamp in the file name to indicate when the file was uploaded. For example, you could use the following naming convention: filename_yyyy_MM_dd_HH_mm_ss.csv. This will help you keep track of when the file was uploaded and ensure that you are processing the most recent version of the file.
    2. Process folder: For files that are being processed by Azure Databricks, it is a good practice to include a description of the processing that is being done in the file name. For example, you could use the following naming convention: filename_Processed.csv. This will help you identify which files have been processed and which ones still need to be processed.
    3. Curated folder: For files that have been processed and are ready for use, it is a good practice to include a description of the data in the file name. For example, you could use the following naming convention: filename_Curated.csv. This will help you identify which files contain curated data and which ones still need to be curated.

    In addition to these suggestions, it is important to ensure that your file names are consistent and easy to understand. Avoid using special characters or spaces in file names, as these can cause issues when processing the files. Also, make sure that your file names are descriptive and provide enough information to identify the contents of the file.

    It's important to plan your data structure before you land it into a data lake. When you have a plan, you can use security, partitioning, and processing effectively.

    For more details, refer to Data lake zones and containers.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.