last modified date in databricks

Question

last modified date in databricks

Shambhu Rai 1,411

Hi,

Expert,

How to Pickup the files using last modified date from blob storage using databricks.. if receives 2 file in 2 mins interval how it will load it

/mnt/delta//Test.csv"

Shambhu Rai 1,411 Reputation points

2023-11-24T01:31:24.03+00:00

suggestion please

Accepted answer

1 additional answer

Your answer

Shambhu Rai 1,411 Reputation points

2023-11-24T01:31:24.03+00:00

suggestion please

Answer 1

Amira Bedhiafi 33,631 Volunteer Moderator

First you need to mount your Azure Blob storage to Databricks to access files in the blob storage. You can do this using the dbutils.fs.mount() method and precise the storage account name, container name, and access key.

Once it is done, you can use butils.fs.ls() to list all files in a directory, it will return a list of FileInfo objects, each containing details like path, name, and last modified time.

To filter them based on the last modified time, you can write a function that compares the last modified time of each file with the desired timestamp.

If you receive files every two minutes , think about setting up a scheduled job in Databricks that runs every 2 minutes.

# Mount blob storage
dbutils.fs.mount(source="blob storage path", mount_point="/mnt/my_mount_point", extra_configs={"fs.azure.account.key.<storage-account-name>.blob.core.windows.net":"<access-key>"})
# Function to filter and load files
def load_recent_files():
    files = dbutils.fs.ls("/mnt/my_mount_point/")
    for file in files:
        if file.lastModified >= desired_timestamp:  # Replace 'desired_timestamp' with your condition
            df = spark.read.csv(file.path)
            # Process and load the dataframe as needed
# NB: Don't forget to schedule this function to run at your desired interval

Shambhu Rai 1,411 Reputation points

2023-11-24T13:24:05.69+00:00
Hi Expert,

i want to create a view of each and every file comes like this in appending order ...how i can do this lets see if i have received 2 files in last modified date... and have multiple notebooks like this ... can i use this methods for all

CustprogDF=spark.read.format("csv").option("header", "true").option("inferSchema", "false").load("/mnt/delta/test.csv") CustprogDF.display() CustprogDF.createOrReplaceTempView("test.view")
Shambhu Rai 1,411 Reputation points

2023-11-27T19:50:42.83+00:00

Hi,

can you please add the append query to view also i am unable to find file path using this df = spark.read.csv(file.path)
Shambhu Rai 1,411 Reputation points

2023-11-28T12:14:44.6766667+00:00

Hi Expert,

how we can append the files in a view

Shambhu Rai 1,411

i am getting this error

AttributeError: 'FileInfo' object has no attribute 'lastModified'

# Function to filter and load files
final_files_list=[]
files = dbutils.fs.ls("/mnt/CARDHIST")
for file in files:
     if file.lastModified >= current_timestamp():  # Replace 'desired_timestamp' with your condition
         df = spark.read.csv(file.path)
         load_recent_files()
         df=load_recent_files()
         df1=df.show()
         df1(files)
         df1.createOrReplaceTempView("TestFiles")
         load_recent_files(df)

Amira Bedhiafi 33,631 Volunteer Moderator

Can you try the following ?

from pyspark.sql import SparkSession
from datetime import datetime
import time
# Create a Spark session
spark = SparkSession.builder.appName("LoadRecentFiles").getOrCreate()
# Function to convert the timestamp
def to_timestamp(ms):
    return datetime.fromtimestamp(ms / 1000.0)
# Current timestamp
current_ts = time.time()
# Function to filter and load files
final_files_list = []
files = dbutils.fs.ls("/mnt/CARDHIST")
for file in files:
    # Compare the file modification time with the current timestamp
    if to_timestamp(file.modificationTime) >= to_timestamp(current_ts):
        df = spark.read.csv(file.path)
        # Perform your operations here, like showing the DataFrame
        df.show()
        # Create or replace a temporary view
        df.createOrReplaceTempView("TestFiles")
        # Add the DataFrame to the final list if needed
        final_files_list.append(df)

Answer 2

ShaikMaheer-MSFT 38,546 Microsoft Employee Moderator

Hi Shambhu Rai,

You can consider creating a function and having parameters for file name and file and call that function every time to create a view.

Hope this helps. Please let me know if any further queries.

Please consider hitting Accept Answer button. Accepted answers help community as well.

Shambhu Rai 1,411 Reputation points

2023-11-27T10:58:25.0166667+00:00

Hi Expert,

Example please
Shambhu Rai 1,411 Reputation points

2023-11-28T06:42:28.36+00:00

Hi Expert,

Please share an example

Shambhu Rai 1,411

no it is error out

AttributeError: 'FileInfo' object has no attribute 'lastModified'

# Function to filter and load files
final_files_list=[]
files = dbutils.fs.ls("/mnt/adls_landing/ShadowbaseLanding/CARDHIST")
for file in files:
     if file.lastModified >= current_timestamp():  # Replace 'desired_timestamp' with your condition
         df = spark.read.csv(file.path)
         load_recent_files()
         df=load_recent_files()
         df1=df.show()
         df1(files)
         df1.createOrReplaceTempView("TestFiles")
         load_recent_files(df)

Share via

last modified date in databricks

1 additional answer

Your answer