Need help in understanding times in ADF's tumbling window trigger

Zhengxiao (Edward) Wang 0 Reputation points Microsoft Employee
2024-05-20T10:01:15.2+00:00

Hi Team,

I'm confused by many times in tumbling window trigger.

We have at least 4 times for tumbling window trigger.

  • @trigger().outputs.windowStartTime
  • @trigger().outputs.windowEndTime
  • @trigger().scheduledTime
  • @trigger().startTime

Besides, there is a trigger time in the azure portal

User's image

My questions are

  1. Is trigger time the same as scheduledTime or startTime?
  2. What's the relationship between windowStartTime, windowEndTime and scheduledTime? Is scheduledTime always the same as windowEndTime or windowStartTime or a time between them?
  3. Because scheduledTime is the time at which the trigger was scheduled to invoke the pipeline run, and startTime is the time at which the trigger **actually** fired to invoke the pipeline run , does this mean startTime should be later than scheduledTime?
  4. Let's say it's 2024-05-20T08:30:00 now and I want to create an hourly tumbling window trigger whose first occurrence is at 2024-05-20T09:00:00. How should I set my start time? I observed that the trigger time is always very close to the window end time. Does this mean I should set the start time to 2024-05-20T08:00:00 instead of 2024-05-20T09:00:00?

Thanks in advance!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,908 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Harishga 5,110 Reputation points Microsoft Vendor
    2024-05-20T11:04:08.0066667+00:00

    Hi @Zhengxiao (Edward) Wang
    Welcome to Microsoft Q&A platform and thanks for posting your question here.

    To understand the different times in a tumbling window trigger, it's important to know their specific meanings.

    Trigger Time vs ScheduledTime vs StartTime

    • Trigger Time: This is the time displayed in the Azure portal. It’s the expected time for the trigger to execute.
    • ScheduledTime (@trigger().scheduledTime): The time that the trigger is configured to invoke the pipeline run.
    • StartTime (@trigger().startTime): The actual time when the trigger fires to invoke the pipeline run.

    Is Trigger Time the same as ScheduledTime or StartTime?

    • Trigger Time is generally the same as ScheduledTime. It’s the planned time for the trigger to fire.
    • StartTime could be the same or slightly after ScheduledTime, depending on when the system actually starts the trigger.

    Relationship Between WindowStartTime, WindowEndTime, and ScheduledTime

    • WindowStartTime (@trigger().outputs.windowStartTime): The start time of the data window for which the trigger will process data.
    • WindowEndTime (@trigger().outputs.windowEndTime): The end time of the data window for which the trigger will process data.
    • ScheduledTime: The time set for the trigger to fire, which can be aligned with either the WindowStartTime or WindowEndTime.

    Is ScheduledTime always the same as WindowEndTime or WindowStartTime or a time between them?

    • ScheduledTime is often set to be the same as WindowEndTime to allow the trigger to fire immediately after the window closes. However, it can also be aligned with WindowStartTime depending on the specific use case.

    StartTime vs ScheduledTime

    • StartTime should be equal to or later than ScheduledTime due to potential system delays.

    Setting Up the Start Time for an Hourly Tumbling Window Trigger

    • Given the current time is 2024-05-20T08:30:00 and you want the first occurrence at 2024-05-20T09:00:00, you should set the startTime to 2024-05-20T09:00:00.
    • The observation that the trigger time is close to the window end time suggests that the trigger is configured to process data immediately after the window closes.

    Use Case Example

    • Current Time: 2024-05-20T08:30:00
    • Desired First Occurrence: 2024-05-20T09:00:00
    • Configuration:
    • Set startTime to 2024-05-20T09:00:00.
    • This means your windowStartTime would be 2024-05-20T08:00:00 and windowEndTime would be 2024-05-20T09:00:00.
    • The ScheduledTime would be set to 2024-05-20T09:00:00, aligning with the windowEndTime.
    • The StartTime will be when the trigger actually fires, which should be close to the ScheduledTime.

    Reference
    https://www.youtube.com/watch?v=vvuq-C_NXLI

    In this scenario, the trigger is set to process data from the past hour (08:00:00 to 09:00:00) once the window ends at 09:00:00. Setting the startTime to 2024-05-20T09:00:00 ensures that the trigger processes data for the hour leading up to 09:00:00.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


  2. Amira Bedhiafi 17,866 Reputation points
    2024-05-20T11:12:31.0366667+00:00

    I will split the answer in 4 parts :

    Question 1 :

    1. Trigger Time : is the time when you view the trigger status and its properties in the Azure portal (the current time or the time when you are looking at the trigger info).
    2. @trigger().outputs.windowStartTime: represents the start time of the current tumbling window. For an hourly window, if your window starts at 09:00, the windowStartTime would be 09:00 for that window.
    3. @trigger().outputs.windowEndTime: represents the end time of the current tumbling window. Continuing the previous example, if the window starts at 09:00 and it's an hourly window, the windowEndTime would be 10:00 for that window.
    4. @trigger().scheduledTime: is the time at which the trigger was scheduled to invoke the pipeline run. For an hourly tumbling window trigger starting at 09:00, the scheduledTime would be exactly at the beginning of each window, e.g., 09:00, 10:00, 11:00, etc.
    5. @trigger().startTime: This is the time at which the trigger actually fired to invoke the pipeline run. This can be slightly later than scheduledTime due to system processing delays.

    Question 2 & 3 :

    • Trigger Time vs. scheduledTime/startTime:
      • Trigger Time is when you are looking at the trigger info in the portal
      • scheduledTime is the exact scheduled moment for the trigger to fire
      • startTime is the actual moment when the trigger fires, which can be slightly after the scheduledTime
    • windowStartTime, windowEndTime, and scheduledTime:
      • scheduledTime is typically aligned with either the windowStartTime or the windowEndTime, depending on your window configuration
      • In an hourly tumbling window, if the window is configured to start at 09:00, then:
        • windowStartTime for that window is 09:00
        • windowEndTime for that window is 10:00
        • scheduledTime for that window is 09:00, as it marks the start of the window.

    Question 4 : Based on your scenario:

    • Current time: 2024-05-20T08:30:00
    • Desired first occurrence: 2024-05-20T09:00:00
    • Hourly tumbling window trigger

    To set this up:

    • Start Time: You should set your start time to 2024-05-20T08:00:00. This is because the tumbling window trigger needs to encompass the window that includes the first occurrence you desire (09:00:00). Setting it to 08:00:00 allows the first window to start at 09:00:00.
    • Window Configuration: Your windows will be:
      • First window: Starts at 08:00:00, Ends at 09:00:00 (This is effectively the setup window and won't trigger any run if configured to start at 09:00:00)
      • Second window: Starts at 09:00:00, Ends at 10:00:00 (this is the first active window)

    More links :

    https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers

    https://learn.microsoft.com/en-us/azure/data-factory/how-to-create-tumbling-window-trigger

    https://learn.microsoft.com/en-us/azure/data-factory/tutorial-data-flow-trigger