September 2011

Volume 26 Number 09

Data Points - Second-Level Caching in the Entity Framework and AppFabric

By Julie Lerman | September 2011

Julie LermanThe Entity Framework (EF) ObjectContext and DbContext maintain state information about entities they’re managing. But once the context goes out of scope, that state information is gone. This type of caching is referred to as first-level caching and is only available for the lifetime of a transaction. If you’re writing distributed applications using the EF where the context doesn’t stick around—and therefore your state information isn’t continuously available—the first-level cache likely won’t suffice to support your demands. This is typically the case with Web applications and services—or even when you’re using some type of repository pattern implementation where a long-running context isn’t available.

Why the EF Can Benefit from Second-Level Caching

Why should you care about having access to a representation of the original state across processes? One of the great benefits of the EF is its ability to automatically generate database persistence commands (inserts, updates and deletes) based on the state information found in the context. But if that state information is unavailable, the EF has nothing to do when it’s time to call SaveChanges. Developers, including myself, have been trying to work around this limitation since the EF was first introduced in 2006.

Second-level caches are instrumental in solving this type of problem. These caches exist outside of the transaction—often outside of the application—and therefore are available to any context instance. And second-level caching is a commonly used coding pattern for caching data for various uses.

Rather than write your own way of caching data, there are caching mechanisms available such as memcached (memcached.org), and even caching support in Microsoft Azure (available in Windows Server as well as in Azure). These services provide the infrastructure for caching so you don’t have to sweat the details. And they expose APIs that make it easy for programmers to read, store and expire data in the cache.

If you have a highly transactional system that can benefit from a cache to avoid repeatedly hitting the database for commonly queried data, you’ll also find yourself looking for a caching solution. This is a great way to enhance performance when using data that’s modified infrequently—for example, reference data or a list of players on a sports team.

Figure 1 shows the first-level cache maintained within an EF context, as well as various contexts accessing a common second-level cache.

First-Level Caching Happens Inside a Transactional Context and Second-Level Caching Is External
Figure 1 First-Level Caching Happens Inside a Transactional Context and Second-Level Caching Is External

Using the EF Caching Provider to Add Second-Level Caching

Designing the logic for reading, storing and expiring cache data takes a bit of work. You’d want to do this when working with the EF when you’re querying for data or storing data. When executing a query against the context, you’ll want to first see if that data exists in the cache so you don’t have to waste resources on a database call. When updating data by using a context’s SaveChanges method, you’ll want to expire and possibly refresh the data in the cache. And working with the cache is more complex than simply reading and writing data. There are plenty of other considerations to take into account. There’s an in-depth article from the Association for Computing Machinery (ACM) that lays out the complexities of ORM caching, “Exposing the ORM Cache: Familiarity with ORM caching issues can help prevent performance problems and bugs” (bit.ly/k5bzd1). I won’t attempt to repeat the pros, cons and hot points outlined in the article. Instead, I’ll focus on implementation.

The EF doesn’t have built-in support for working with second-level caches. That functionality would make the most sense as part of the ObjectContext and DbContext logic when they’re about to interact with the database. But implementing the caching while taking into account the various issues discussed in the ACM article is non-trivial, especially with the lack of obvious extensibility points in the EF. One of the features that’s frequently highlighted as a big difference between the EF and NHibernate is the fact that NHibernate has built-in support for implementing second-level caching.

But all is not lost! Enter the EF providers and the brainy Jarek Kowalski (j.mp/jkefblog), former member of the EF team.

The EF provider model is the key to how the EF is able to support any relational database—as long as there’s a provider written for that database that includes EF support. There are a slew of third-party providers allowing you to use the EF with a growing array of databases (SQL Server, Oracle, Sybase, MySQL and Firebird are just some examples).

In the EF, the ObjectContext talks to the lower-level EntityClient API, which communicates with the database provider to work out database-specific commands and then interacts with the database. When the database is returning data (as a result of queries or commands that update store-generated values), the path is reversed, as shown in Figure 2.

Flow from the EF Context Through an ADO.NET Provider to Get to the Database
Figure 2 Flow from the EF Context Through an ADO.NET Provider to Get to the Database

The spot where the provider lives is pliable, enabling you to inject additional providers between the EntityClient and the database. These are referred to as provider wrappers. You can learn more about writing ADO.NET providers for the EF or other types of providers on the EF team blog post, “Writing an EF-Enabled ADO.NET Provider” (bit.ly/etavcJ).

A few years ago, Kowalski used his deep knowledge of the EF providers to write a provider that captures messages between the Entity Client and the ADO.NET provider of choice (whether that’s SqlClient, MySQL Connector or another) and injects logic to interact with a second-level caching mechanism. The wrapper is extensible. It provides underlying logic for any type of caching solution, but then you need to implement a class that bridges between this wrapper and the caching solution. The provider sample works with an in-memory cache, and the solution has a sample adapter to use “Velocity,” the code name for Microsoft distributed caching. Velocity eventually became the caching mechanism in the Microsoft Windows Server AppFabric.

Building an EFCachingProvider Adapter for Windows Server AppFabric

The EFCachingProvider was recently updated for the EF 4. The Tracing and Caching Provider Wrappers for Entity Framework page (bit.ly/zlpIb) includes great samples and documentation, so there’s no need to repeat all of that here. However, the Velocity adapter was removed and there was no replacement to use the caching in Azure.

Azure lives in two places: Windows Server and Azure. I’ve recreated the provider class that worked with Velocity so that it will now work with the caching in Windows Server AppFabric, and I’ll share how to accomplish this yourself.

First, be sure you’ve installed the EF provider wrappers from bit.ly/zlpIb. I’ve worked in the example solution, which contains the projects for the provider wrappers (EFCachingProvider, EFTracingProvider and EFProviderWrapperToolkit). There are also some client projects that test out the final caching functionality. The InMemoryCache provider is the default caching strategy and is built into the EFCachingProvider. Also highlighted in that project in Figure 3 is ICache.cs. The InMemoryCache inherits from this, and so should any other adapter you want to create to use other caching mechanisms—such as the AppFabricCache adapter that I created.

ICache and InMemoryCache Are Core Classes in the EFCachingProvider
Figure 3 ICache and InMemoryCache Are Core Classes in the EFCachingProvider

In order to develop for Azure, you’ll need the Azure cache client assemblies and a minimal installation of Azure on your development machine. See the MSDN Library topic, “Walkthrough: Deploying Windows Server AppFabric in a Single-Node Development Environment,” at bit.ly/lwsolW, for help with this task. Be warned that it’s a bit involved. I’ve done it myself on two development machines.

Now you can create an adapter for Windows Server AppFabric. This is very close to the original Velocity3 adapter, but I did spend a bit of time learning how to work with the Azure client API in order to get these stars aligned. If you’re creating an adapter for a different caching mechanism, you’ll need to adjust accordingly to that cache’s API.

Another critical piece to the puzzle is to extend your ObjectContext class. I hope to try this out with an EF 4.1 DbContext soon, but this will necessitate modifying the underlying logic of the EFCachingProvider.

You use the same code to extend the context regardless of which implementation of ICache you’re working with. The extended class inherits from your context class (which, in turn, inherits from ObjectContext) and then exposes some extension methods from the EFCachingProvider. These extension methods enable the context to interact directly (and automatically) with the caching provider. Figure4 shows an example in the solution that extends NorthwindEFEntities, a context for a model built against the Northwind database.

Figure 4 Extending an Existing Class that Inherits from ObjectContext, NorthwindEFEntities

using EFCachingProvider;

using EFCachingProvider.Caching;

using EFProviderWrapperToolkit;

 

namespace NorthwindModelDbFirst

{

  public partial class ExtendedNorthwindEntities : NorthwindEFEntities

  {

 

  public ExtendedNorthwindEntities()

    : this("name=NorthwindEFEntities")

  {

  }

 

  public ExtendedNorthwindEntities(string connectionString)

    : base(EntityConnectionWrapperUtils.

    CreateEntityConnectionWithWrappers(

    connectionString,"EFCachingProvider"))

  {

  }

  

  private EFCachingConnection CachingConnection

  {

    get { return this.UnwrapConnection<EFCachingConnection>(); }

  }

 

  public ICache Cache

  {

    get { return CachingConnection.Cache; }

    set { CachingConnection.Cache = value; }

  }

 

  public CachingPolicy CachingPolicy

  {

    get { return CachingConnection.CachingPolicy; }

    set { CachingConnection.CachingPolicy = value; }

  }

 

  #endregion

  }

}

I’ve added a Class Library project to the solution called EFAppFabricCacheAdapter. That project needs references to the EFCachingProvider as well as two of the Azure assemblies: Microsoft.ApplicationServer.Caching.Core and Microsoft.ApplicationServer.Caching.Client. Figure 5 shows my adapter class, AppFabricCache, which emulates the original VelocityCache.

Figure 5 Adapter that Interacts with Windows Server AppFabric

using System;

using System.Collections.Generic;

using System.Linq;

using System.Security.Cryptography;

using System.Text;

using EFCachingProvider.Caching;

using Microsoft.ApplicationServer.Caching;

 

namespace EFAppFabricCacheAdapter

{

  public class AppFabricCache : ICache

  {

    private DataCache _cache;

 

    public AppFabricCache(DataCache cache)

    {

      _cache = cache;

    }

 

    public bool GetItem(string key, out object value)

    {

      key = GetCacheKey(key);

      value = _cache.Get(key);

 

      return value != null;

    }

 

    public void PutItem(string key, object value,

      IEnumerable<string> dependentEntitySets,

      TimeSpan slidingExpiration, DateTime absoluteExpiration)

    {

      key = GetCacheKey(key);

      _cache.Put(key, value, absoluteExpiration - DateTime.Now,

        dependentEntitySets.Select(c => new DataCacheTag(c)).ToList());

 

      foreach (var dep in dependentEntitySets)

      {

        CreateRegionIfNeeded(dep);

        _cache.Put(key, " ", dep);

      }

 

    }

 

    public void InvalidateSets(IEnumerable<string> entitySets)

    {

      // Go through the list of objects in each of the sets.

      foreach (var dep in entitySets)

      {

        foreach (var val in _cache.GetObjectsInRegion(dep))

        {

          _cache.Remove(val.Key);

        }

      }

    }

 

    public void InvalidateItem(string key)

    {

      key = GetCacheKey(key);

 

      DataCacheItem item = _cache.GetCacheItem(key);

      _cache.Remove(key);

 

      foreach (var tag in item.Tags)

      {

        _cache.Remove(key, tag.ToString());

      }

    }

 

    // Creates a hash of the query to store as the key 

    private static string GetCacheKey(string query)

    {

      byte[] bytes = Encoding.UTF8.GetBytes(query);

      string hashString = Convert

        .ToBase64String(MD5.Create().ComputeHash(bytes));

      return hashString;

    }

 

    private void CreateRegionIfNeeded(string regionName)

    {

      try

      {

        _cache.CreateRegion(regionName);

      }

      catch (DataCacheException de)

      {

        if (de.ErrorCode != DataCacheErrorCode.RegionAlreadyExists)

        {

          throw;

        }

      }

    }

  }

}

The class uses the Microsoft.ApplicationServer.Caching.DataCache to fulfill the required implementation of ICache. Most notable is the use of Azure regions in the PutItem and InvalidateSets. When a new item is stored in the cache, the adapter also adds it to a region, or group, that’s defined by all entities in a particular entity set. In other words, if you have a model with Customer, Order and LineItem, then your Customer instances will be cached in a Customers region, Order instances in a region called Orders and so on. When a particular item is invalidated, rather than looking for that particular item and invalidating it, all of the items in the region are invalidated.

It’s this use of regions that caused me to set aside my attempt to implement Azure support. At the time that I’m writing this column, Azure is still a CTP and doesn’t support regions. Because underlying code of the caching provider is dependent on the regions and these methods, I was unable to easily create a provider implementation that would just work for Azure. You can, of course, call the InvalidateItem method yourself, but that would eliminate the benefit of the automated behavior of the provider.

Using the Azure Cache Adapter

There’s one last project to add, and that’s the project that exercises the adapter. The EFCachingProvider demo that’s part of the original solution uses a console app with a number of methods to test out the caching: SimpleCachingDemo, CacheInvalidationDemo and NonDeterministicQueryCachingDemo. In my added console app for testing out the AppFabricCache, you can use the same three methods with the same implementation. What’s interesting about this test is the code for instantiating and configuring the AppFabricCache that will be used by the extended context in those three methods.

An Azure DataCache needs to be created by first identifying an Azure server Endpoint, then using its DataCacheFactory to create the DataCache. Here’s the code to do that:

private static ICache CreateAppFabricCache()

{

  var server = new List<DataCacheServerEndpoint>();

  server.Add(new DataCacheServerEndpoint("localhost", 22233));

  var conf = new DataCacheFactoryConfiguration();

  conf.Servers = server;

  DataCacheFactory fac = new DataCacheFactory(conf);

  return new AppFabricCache(fac.GetDefaultCache());

}

Note that I’m hardcoding the Endpoint details for the simplicity of this example, but you probably won’t want do that in a production application. Once you’ve created the DataCache, you then use it to instantiate an AppFabricCache.

With this cache in hand, I can pass it into the EFCachingProvider and apply configurations, such as a DefaultCachingPolicy:

ICache dataCache = CreateAppFabricCache();

EFCachingProviderConfiguration.DefaultCache = dataCache;

EFCachingProviderConfiguration.DefaultCachingPolicy = CachingPolicy.CacheAll;

Finally, when I instantiate my extended context, it will automatically look for a caching provider, finding the AppFabricCache instance that I just set as the default. This will cause caching to be active using whatever configuration settings you applied. All you need to do is go about your business with the context—querying, working with objects and calling SaveChanges. Thanks to the extension methods that bind your context to the EFProviderCache and the DataCache instance that you attached, all of the caching will happen automatically in the background. Note that the CacheAll CachingPolicy is fine for demos, but you should consider using a more fine-tuned policy so that you aren’t caching data unnecessarily.

The EFProviderCache has been designed with extensibility in mind. As long as the target caching mechanism you want to use supports the standard implementations to store, retrieve, expire and group data, you should be able to follow the pattern of this adapter to provide a shared cache for your applications that use the EF for data access.


Julie Lerman is a Microsoft MVP, .NET mentor and consultant who lives in the hills of Vermont. You can find her presenting on data access and other Microsoft .NET topics at user groups and conferences around the world. She blogs at thedatafarm.com/blog and is the author of the highly acclaimed book, “Programming Entity Framework” (O’Reilly Media, 2010). Follow her on Twitter at twitter.com/julielerman.

Thanks to the following technical experts for reviewing this article: Jarek Kowalski and Srikanth Mandadi