Azure Data Factory Data Set - the most recent file

Gabryel, Andrzej 161 Reputation points
2020-12-30T13:41:08.26+00:00

Hi,

To validate incoming data to Azure Data Lake, I have built Pipe and Data flow (data flow is run inside the pipe). When a new file is added to the folder, it should immediately activate and use the newest file from this folder (so that it will be the added file). Two things are missing.

  1. Event based trigger
  2. I have to define the dataset, so the most recent file is always taken from the folder. Both in Pipe and Data Flow, the newest file needs to be used. Right now, I have no idea how to do that.

My pipe begins this way. And I need to use the newest file in the "Get Metadata Lease" activity here. The file used now is hardcoded,

52312-image.png

The beginning of my data flow is here. The most recent file should be used here as well.

52225-image.png

Data file used in the solution is here. So I need to change "Lease.csv" and make ADF to take the most recent file
52217-image.png

Greetings
Andrzej

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,129 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
8,031 questions
0 comments No comments
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 25,811 Reputation points
    2020-12-31T21:56:51.81+00:00

    Hello @Gabryel, Andrzej and welcome back to Microsoft Q&A.

    If I understand correctly, what you really want to know is, "How do I get the file name from an event trigger?"

    For documentation, see step 10 and step 11 in how-to-create-event-trigger.

    The idea, is to pass the filename and/or folderpath into pipeline parameters from the event trigger. Then you pass these pipeline parameters into your datasets through the activities.

    So, step 1: create some pipeline parameters
    Step 2: From your pipeline editing screen, do either new trigger or edit trigger. It is important you do it from the pipeline editing screen and not the manage > triggers screen because we want to use the context of your pipeline to set the parameters. In the last stage of creating/updating the trigger you can specify the value of parameters. We want to use:

    @triggerBody().folderPath  
    @triggerBody().fileName  
    

    52643-image.png
    Step 3: we need a parameterized dataset
    52721-image.png
    52722-image.png
    Step 4: pass the pipeline variable into the activity to be used in the dataset parameters
    52693-image.png


0 additional answers

Sort by: Most helpful