How to dynamically access files from mounted data lake in Databricks notebook?

Question

Hello everyone,

I have a databricks notebook running some python code for ETL transformation of data from a CSV file. I have the csv files in a blob storage and have mounted the said storage to my notebook using dbutils.fs.mount

Now, the csv files are stored in the following directory structure: root/year/month/day/file.csv

For example, today being 29 October 2023, a file will be stored inside the blob with the following path: root/2023/10/29/file.csv

I have mounted the root of storage. I want to access the latest date's file every time I run the notebook. So, today I need to access csv inside root/2023/10/29/. But tomorrow when I run the notebook it should be root/2023/10/30/ and so on.

How can I bring about the said functionality using python code?

Accepted Answer

@Varun S Kumar - Thanks for the question and using MS Q&A platform.

To dynamically access the latest date's file from the mounted blob storage in Databricks notebook, you can use the datetime module in Python to get the current date and then construct the path to the file based on the current date.

Here's an example code snippet that you can use:

import datetime

# Get the current date
now = datetime.datetime.now()

# Construct the path to the file based on the current date
path = f"/mnt//root/{now.year}/{now.month}/{now.day}/file.csv"

# Read the CSV file using the constructed path
df = spark.read.format("csv").option("header", "true").load(path)

In the above code, replace with the name of the mount point you used when mounting the blob storage. The datetime.datetime.now() function returns the current date and time, which you can use to construct the path to the file. The f string syntax is used to construct the path string with the current year, month, and day.

You can then use the constructed path to read the CSV file using the spark.read function. This will read the latest date's file every time you run the notebook.

As per the repro, I had created similar structure in the ADLS Gen2 account.
User's image

And able to get the data as per your requirement:

User's image

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

How to dynamically access files from mounted data lake in Databricks notebook?

0 additional answers