Datalake - Get Last file modified date in Synapse notebook

Sivagnana Sundaram, Krithiga 31 Reputation points
2021-08-09T14:48:59.3+00:00

Hi - I am trying to get the last modified date of the file in datalake (ADLS Gen2) using Spark notebook. I am using the mssparkutils, but don't see an option to actually get the file specifics.

Is there any options that I can try?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,426 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,696 questions
{count} votes

Accepted answer
  1. Saurabh Sharma 23,791 Reputation points Microsoft Employee
    2021-08-10T15:28:30.157+00:00

    Currently mssparkutils doesn’t expose file modified time info to customer when calling mssparkutils.fs.ls API.
    As a workaround you can directly call Hadoop filesystem APIs to get the time info.

    import org.apache.hadoop.fs.FileSystem;  
    import org.apache.hadoop.fs.Path;  
    import org.apache.hadoop.fs.FileStatus;  
    import java.net.URI  
    val conf = spark.sessionState.newHadoopConf()  
    val fs = FileSystem.get(new URI("/"), conf)  
    for (fileStatus <- fs.listStatus(new Path("/"))) {  
        println(s"file name: ${fileStatus.getPath.getName} - modify time: ${fileStatus.getModificationTime}")  
    }  
    

    122044-image.png

    Additionally, products team has open an internal workitem to address the same in future releases. Please let me know if you have any questions.

    Thanks
    Saurabh

    ----------

    Please do not forget to "Accept the answer" wherever the information provided helps you to help others in the community.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful