Get Last Modified Date on Partitioned Data

Mariah 136 Reputation points
2021-10-04T22:06:42.4+00:00

I am ultimately trying to get the names of the partitions (ie DATE_ID=20211004) that have been modified within the last 24 hours. I've tried using the Get Metadata activity, but it seems to only iterate partition folders, and not the files within the partitioned folders. I need to get the last modified date of the partitioned files within the folders, but I'm not sure how to do that.

Folder structure:

product_data/
|---------DATE_ID=20211002/
|---------|---------part-00011-tid-345678900-abc123-6793-1.c000.snappy.parquet
|---------|---------committed_123456789
|---------DATE_ID=20211003/
|---------|---------part-00086-tid-345678900-abc123-6756-2.c000.snappy.parquet
|---------|---------committed_123456789
|---------DATE_ID=20211004/
|---------|---------part-00042-tid-345678900-abc123-6712-1.c000.snappy.parquet
|---------|---------committed_123456789

In the example above, I'd like to get the last modified date on each of the part-00.......parquet files

I've tried a variety of For Each + get Metadata activity combinations such as the one below:

https://ibb.co/B4vMF6j

B4vMF6j

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,600 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,422 Reputation points Microsoft Employee
    2021-10-05T23:50:36.013+00:00

    Hi @Mariah ,

    Welcome to Microsoft Q&A forum and thanks for reaching out here.

    If you would just like to get the last modified date of the .parquet files from your partitioned folders, one way is to have Two Get Meta Data Activities.

    1. First have a GetMetaData Activity and point the dataset to Root folder ( product_data ) and use childItems Argument to get the list of Partitioned folder names (For eg: [ DATE_ID=20211002, DATE_ID=20211003, ....] ) that has the .parquet files for which you want to get the last modified date/Filenames
    2. Then have a subsequent ForEach activity and pass the GetMEtadata activtiy output which contains the list of Partitioned folder names and loop through each Partitioned folder name.
    3. Inside ForEach Activity have another (2nd) GetMetaData Activity and point its dataset to Partitioned Folder name dynamically which you get for each iteration of the ForEach activity. In this GetMetaData activity configure the Filter by Last Modified start and end time as per requirement and in Argument select childItems & LastModified which will return the files list that has been modified in last 24 hr. You can have a subsequent filter activity in case if you want to filter files with name start with Part-
    4. Then have a Append Variable activity to collect the list of file names and their last modified datetime

    Hope this helps. Do let us know if you have further query.

    ----------

    • Please don't forget to click on 130616-image.png and upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators