Best Practices to give file name that is process by Azure Databricks

Question

Best Practices to give file name that is process by Azure Databricks

manish verma 516

We are looking for Best Practices to give file name that is process by Azure Databricks

We have seen many suggestions to give filenames.

i.e filename_yyyy_MM_DD

i.e filename_extension.csv

i.e filename_Fullload_YYYY_MM_DD

i.e filename_ Incremental_YYYY_MM_DD

request from Microsoft expert, please suggest filename standard for

Inbound folder-

Process Folder-

Curated Folder-

PRADEEPCHEEKATLA 90,641 Reputation points Moderator

2023-10-13T09:07:58.8766667+00:00

@manish verma - Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Accepted answer

0 additional answers

Your answer

PRADEEPCHEEKATLA 90,641 Reputation points Moderator

2023-10-13T09:07:58.8766667+00:00

@manish verma - Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

@manish verma - Thanks for the question and using MS Q&A platform.

When it comes to naming files that are processed by Azure Databricks, there are several best practices that you can follow to ensure consistency and clarity. Here are some suggestions for naming files in the inbound, process, and curated folders:

Inbound folder: For files that are being uploaded to the inbound folder, it is a good practice to include a timestamp in the file name to indicate when the file was uploaded. For example, you could use the following naming convention: filename_yyyy_MM_dd_HH_mm_ss.csv. This will help you keep track of when the file was uploaded and ensure that you are processing the most recent version of the file.
Process folder: For files that are being processed by Azure Databricks, it is a good practice to include a description of the processing that is being done in the file name. For example, you could use the following naming convention: filename_Processed.csv. This will help you identify which files have been processed and which ones still need to be processed.
Curated folder: For files that have been processed and are ready for use, it is a good practice to include a description of the data in the file name. For example, you could use the following naming convention: filename_Curated.csv. This will help you identify which files contain curated data and which ones still need to be curated.

In addition to these suggestions, it is important to ensure that your file names are consistent and easy to understand. Avoid using special characters or spaces in file names, as these can cause issues when processing the files. Also, make sure that your file names are descriptive and provide enough information to identify the contents of the file.

It's important to plan your data structure before you land it into a data lake. When you have a plan, you can use security, partitioning, and processing effectively.

For more details, refer to Data lake zones and containers.

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

manish verma 516 Reputation points

2023-10-13T10:28:57.59+00:00

Thanks a lot
PRADEEPCHEEKATLA 90,641 Reputation points Moderator

2023-10-17T05:01:49.48+00:00

@manish verma - Glad to know it helped. Continue to use MS Q&A platform for any questions related to Azure!

Share via

Best Practices to give file name that is process by Azure Databricks

0 additional answers

Your answer