Hi @arkiboys ,
Thankyou for using Microsoft Q&A platform and posting your query.
As I understand your ask here is to fetch the latest date file present in the ADLS directory. Please correct me if my understanding is incorrect.
For this requirement , you need to navigate to the latest year folder and get the latest month subfolder present within that , then loop inside the month folder to get the latest date sub folder then get all the files present in that particular date folder.
I have files in following folder structure for the demo:
data/2022/02/26/parquet files
...
data/2022/03/02/parquet files
I created notebook in my databricks workspace and executed following steps:
1. Use dbutils.fs.ls to list down all the month subfolders present within year 2022 folder. Iterate through them using for loop and sort in reverse order
fileInfos = dbutils.fs.ls('/FileStore/data/2022/')
monthPaths = []
for fileinfo in fileInfos:
monthPaths.append(fileinfo.path)
monthPaths.sort(reverse=True)
2. Use dbutils.fs.ls to list down all the dates subfolders present within latest month folder. Iterate through them using for loop and sort in reverse order
fileInfos = dbutils.fs.ls(monthPaths[0])
dayPaths = []
for fileInfo in fileInfos:
dayPaths.append(fileInfo.path)
dayPaths.sort(reverse=True)
print(dayPaths)
3. Concat '*.parquet' to fetch all the files present in that particular day. The output will give us the files present in the latest date folder.
latestDatePath = dayPaths[0] + '*.parquet'
print(latestDatePath)
Hope this will help. Please let us know if any further queries.
------------------------------
- Please don't forget to click on
or upvote
button whenever the information provided helps you.
Original posters help the community find answers faster by identifying the correct answer. Here is how - Want a reminder to come back and check responses? Here is how to subscribe to a notification
- If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators