다음을 통해 공유


Dangers of time travel

I tend to partition StreamInsight operators into three classes: the basic “temporal/relational” operators; the “time travel” operators, and; the “elephant” operators. The basic operators illustrate the rules. The time travel operators bend the rules. The scan operators break them altogether. While I’d like to focus on time travel today, we can’t bend the rules before knowing what they are. Let’s begin with the basics!

A StreamInsight query describes how the results of a normal relational query change over time. The start time of an event indicates the time that a row is inserted into the relational input. The end time indicates when that row is deleted. StreamInsight’s basic temporal/relational operators can be understood in those terms. For instance, the StreamInsight ‘join’ operator produces output for coincident input tuples matching some predicate. Tuples are coincident if their time intervals overlap. Another way of looking at it: all matching tuples across all time logically produce output, but that output may have an empty temporal intersection. The basic temporal/relational operators:

  • Join
  • Left anti-join
  • Selection
  • Projection
  • Group and apply
  • Snapshot window operators (aggregate, top-K, user-defined operator).

Time travel operators bend the rules because they allow an input event to contribute to results outside of their original time interval! The time travel operators:

  • Hopping/tumbling window operators, because an event contributes to the window result even if was active for only a portion of the window.
  • Clip event duration, because the clipper masks the clippee for the rest of time.
  • Alter event lifetime, because it changes the time interval for an event.

The fundamental time travel operator is alter event lifetime because the others can be expressed in terms of it. (I’ll leave this as an exercise for the reader). A hint for hopping window operators: one way of thinking about a hopping window is that events belonging to the same window are reassigned to the same time interval so that they can contribute to the same output time interval!) While there are several variations on the alter lifetime operator, they all boil down to the following components:

  • An input stream.
  • An (optional) function that modifies the start time of events in the stream.
  • An (optional) function that modifies the duration of events in the stream.

If you do modify start times and durations, make sure that the modifications to start and end times are monotonic. In other words, make sure that the changes do not affect the relative order of start and end edge events. Otherwise, you risk introducing CTI violations. CTI violations bring down the query: they indicate that a new event has arrived affecting a result that has already been committed! Like traveling through time to alter the past, violating CTIs causes (can I even use that word here?) bad things to happen.

Shifting all events forward or backward by a fixed amount is safe because the CTIs are moving with the events*:

var shiftedBack = source.ShiftEventTime(_ => TimeSpan.FromSeconds(-1));
var shiftedForward = source.ShiftEventTime(_ => TimeSpan.FromDays(7));

Making time pass faster or slower doesn’t violate the monotonicity requirement, but depending on the nature of the input can introduce CTI violations. If the input stream contains end edge events, then they could also violate CTIs if not modified, as in the following example which attempts to double the speed at which time passes:

var startTime = DateTimeOffset.UtcNow;

var source = this.Application
    .DefineEnumerable(() => new[]
        {
            EdgeEvent.CreateCti<int>(startTime),
            EdgeEvent.CreateStart(startTime, 0),
            EdgeEvent.CreateCti<int>(startTime.AddSeconds(10)),
            EdgeEvent.CreateEnd(startTime, startTime.AddSeconds(10), 0),
            EdgeEvent.CreateCti<int>(startTime.AddSeconds(11)),
        })
    .ToStreamable(AdvanceTimeSettings.StrictlyIncreasingStartTime);

var fastForward = source.AlterEventStartTime(
    e => e.StartTime.AddTicks(e.StartTime.Ticks));

We can address the failure by ensuring that end edges also advance at double-speed:

var fastForward = source.AlterEventLifetime(
    e => e.StartTime.AddTicks(e.StartTime.Ticks),
    e => TimeSpan.FromTicks((e.EndTime - e.StartTime).Ticks * 2L));

There is a very important loophole that can be exploited for duration modifications. If the duration selector returns a fixed value for every event, StreamInsight no longer needs to produce end edges internally, and those non-existent end edges no longer pose a CTI violation risk:

var fastForwardFixedInterval = source.AlterEventLifetime(
    e => e.StartTime.AddTicks(e.StartTime.Ticks),
    _ => TimeSpan.FromSeconds(1));

In summary, there are several completely safe forms of time travel in StreamInsight:

  1. Shift all events forward or backward in time by a fixed amount.
  2. Established a fixed duration for events.
  3. Apply modifications that do not affect the relative order of start and end edges.
  4. The built-in time travel operators, clip and hopping window.

While it may be safe to stray from these rules for specific data and query combinations, I suggest playing it safe!

* Notice that the start time selector function does not allow behavior to be affected by event payload. This is because CTI events, which are also modified by the selector, do not carry payloads.

Regards,
Colin Meek/The StreamInsight Team