Cutting Edge - Event Sourcing for the Common Application

By Dino Esposito | September 2015

Dino Esposito It’s such a common and natural activity you might not even give it much thought. When you think of data storage, you naturally consider a format that simply preserves the current state of your data. While there are some large-scale systems, such as the insurance or banking industries where any software actions are carefully tracked and recorded, saving the current data state is more than enough for most applications and Web sites.

The current-state approach takes a snapshot of the system state and makes it persistent. The data typically resides in a relational database. That’s enough to conduct new transactions and retrieve the results of past transactions. Scenarios where current state storage is insufficient haven’t been too common for the past decade.

These days, the business landscape is rapidly changing. Tracking business and domain events is becoming more of a necessity. Event Sourcing (ES) is a pattern that affects storage architecture and the manner in which you retrieve and insert stored data. ES isn’t just about auditing and recording business-relevant events in a persisted domain. It’s about using a lower abstraction level for saving your data and using ad hoc tools and patterns to create multiple data projections.

ES might look like a smart and cool way to log and audit business functions, but it’s really a new storage model theory that’s as relevant as the relational model has been since the beginning. It might even have a greater impact on modern software than NoSQL storage. ES isn’t an alternative to today’s relational and NoSQL products. You implement ES on top of relational databases and NoSQL data stores. ES is about changing your vision of application storage and using events instead of stateful values as the data tokens.

When Does Event Sourcing Help?

A basic but common way to go beyond current state storage is to track update history. Imagine a simple bookstore application. You have a description property on each book and you give your users editing permission. Should you keep track of an old description when a user enters a new one?

Everyone’s requirements might be different, but imagine that tracking updates in this example is important. How would you implement that? One option is to store the current book state and log any update details on a separate table. You could have one record for each update. The update record would contain the update delta, such as the old and new value of each modified column.

You could also do this a different way. The Books table could contain multiple records for the same book marked with a given ID. Each record would represent a timestamped state listed in order (see Figure 1).

Figure 1 Multiple Records Hold Entity History

This scenario requires an ad hoc API to read the current record state. It’s not simply a query in the repository that selects the record by ID. You have to pick up the one with the latest timestamp or highest updated progressive number. Also, the union of all events that relate to a given data entity form a stream. That event stream is a popular concept in ES.

ES helps whenever business demands you track a series of events. While ES might look like cross-cutting concerns such as logging or auditing, it’s quite different. It doesn’t log events to profile or track exceptions. It just tracks business events. And it isn’t a cross-cutting concern, but an architectural decision that applies primarily to storage.

Event Sourcing Defined

In a nutshell, ES is about using events as the primary data source. ES isn’t necessarily useful to just any applications, so developers blissfully ignored it for decades. If ES seems useless today, it’s mostly because you don’t need it yet.

I like to summarize the need for ES as follows: If a domain expert needs to track the sequence of events the software can produce, then event sourcing is a viable option. Otherwise, it could be events are still useful to express workflows and concatenate pieces of business logic. In this case, though, events aren’t first-class citizens in the domain and may not be persisted. This is the mainstream scenario today.

Let’s see what you do when events are the primary data source of your application. ES affects two aspects of storage: persistence and queries. Persistence is characterized by three core operations—Insert, Update and Delete. In an ES scenario, Insert is nearly the same as in a classic system that persists current entity state. The system receives a request and writes a new event to the store. The event contains a unique identifier (for example, a GUID), type name or code that identifies the type of the event, a timestamp and associated information.

The Update consists of another Insert in the same container of data entities. The new entry simply reflects the data—which properties have changed, the new value and, if relevant in the business domain, why and how it changed. Once an update has been performed, the data store evolves, as shown in Figure 2.

Figure 2 A New Record Indicates Update to Entity with ID #1

The Delete operation works in the same way as an Update, except it has different information so it’s clear it was a deletion.

Making updates like this immediately poses a few issues when it comes to queries. How would you know if a given record exists or what its current state might be? That requires an ad hoc layer for queries to conceptually select all records with a matching ID, then analyze the data set event after the event.

For example, it could create a new data entity based on the content of the Created event. Then it would replay all successive steps and return what remains at the end of the stream. This technique is known as event replay. The plain replay of events to rebuild the state might raise some concerns about performance.

Think about a bank account. A customer who opened a bank account years ago would have accumulated hundreds of operations and events since then. To get the current balance, you have to replay several hundred operations to rebuild the current account state. That might not be always practical.

There are workarounds for this scenario. The most important consists of creating snapshots. A snapshot is a record that saves the known state of the entity at a given time. This way, there’s no need to replay events dated before the snapshots.

ES isn’t bound to any technology or product, whether a particular relational database or NoSQL data store. ES does raise the need for at least a special software component—the event store. The event store is essentially an event log. You can create one using your own code on top of any data store API of your choice.

An event store has two main characteristics. First and foremost it’s an append-only data store. It doesn’t support updates and may optionally support for only specific types of deletions. Second, an event store must be able to return the stream of events associated with a given key. You can create this layer of code yourself or use available tools and frameworks.

Event Store Options

You can implement an event store using anything that works. It often uses a relational database or some flavor of a NoSQL data store as the persistence engine. If you do plan to use a relational database, you can have one table per type of entity that produces one row per event.

Events typically have different layouts. For example, each event might have a different number of properties to save, which makes it hard to work out a common schema for all rows. If a common schema resulting from the union of all possible columns is even possible and performs acceptably, this is an easy option to implement.

Otherwise, you can look into the Column Index Store feature of SQL Server 2014, which configures the table to store data in vertical columns instead of horizontal rows. Another option that works with any version of SQL Server is normalizing event properties to a JSON object and storing it as a string in a single column.

In NoSQL jargon, “document” is an object with a variable number of properties. Some NoSQL products specialize in storing documents. From a developer’s perspective, it couldn’t be easier. Create a class, fill it with values, and store it as is. The type of class is key information that links multiple events. If you go with NoSQL, then you just have an event object and save it.

Ongoing Projects

ES is a relatively young architectural approach. The standard tools that help write code on top of event-based data stores are still emerging. You can definitely arrange an ES solution on your own, but some ad hoc tools can help you deal with storage of events in a more structured way.

The primary benefit of using an event-aware data store is the tool, like a database, guarantees you only perform actions that read and append events in a way that guarantees the business consistency of the event-sourcing approach. One framework specifically designed for storing events is the NEventStore project (neventstore.org). This lets you write and read back events, and operates independent of persistence. Here’s how you save an event:

var store = Wireup.Init()
  .UsingSqlPersistence("connection")
  .InitializeStorageEngine()
  .UsingJsonSerialization()
  .Build();
var stream = store.CreateStream(aggregateId);
stream.Add(new EventMessage { Body = eventToSave });
stream.CommitChanges(aggregateId);

To read events back, open the stream and loop through the collection of committed events.

Event Store (geteventstore.com) is another that works by offering an API for plain HTTP and .NET for event streams. In ES jargon, an aggregate equates a stream in the store. You can perform three basic operations on an event stream: write events; read the last event, a specific event and even a slice of events; and subscribe to get updates.

There are three types of subscriptions. One is volatile, which means writing an event to a given stream invokes a callback function is invoked every time. Another is catch-up, which means you’ll get notifications for each event in the store starting from a given one and after that for any newly added event. Finally, the persistent subscription addresses the scenario when multiple consumers are waiting for events to process. The subscription guarantees events are delivered to consumers at least once, but possibly multiple times and in unpredictable order.

Wrapping Up

Event Sourcing uses events as the application data source. You don’t architect the application to save the last-known state of entities, but the list of relevant business events. The event data source stores data at a low level of abstraction. You need to apply projections to get from there to the actual entity state required for transactions and queries. The projection is the process of replaying events and performing some tasks. The most obvious projection is building the current state; but you can have any number or type of projections out of events.

Dino Esposito is the co-author of “Microsoft .NET: Architecting Applications for the Enterprise” (Microsoft Press, 2014) and “Programming ASP.NET MVC 5” (Microsoft Press, 2014). A technical evangelist for the Microsoft .NET Framework and Android platforms at JetBrains and frequent speaker at industry events worldwide, Esposito shares his vision of software at software2cents.wordpress.com and on Twitter at twitter.com/despos.

Thanks to the following technical expert for reviewing this article: Jon Arne Saeteras

Share via