January 2012
Volume 27 Number 01
Forecast: Cloudy - Microsoft Azure Caching Strategies
By Joseph Fultz | January 2012
My two-step with caching started back during the dot-com boom. Sure, I had to cache bits of data at the client or in memory here and there to make things easier or faster for applications I had built up until that point. But it wasn’t until the Internet—and in particular, Internet commerce—exploded that my thinking really evolved when it came to the caching strategies I was employing in my applications, both Web and desktop alike.
In this column, I’ll map various Azure caching capabilities to caching strategies for output, in-memory data and file resources, and I’ll attempt to balance the desire for fresh data versus the desire for the best performance. Finally, I’ll cover a little bit of indirection as a means to intelligent caching.
Resource Caching
When referring to resource caching, I’m referring to anything serialized into a file format that’s consumed at the endpoint. This includes everything from serialized objects (for example, XML and JSON) to images and videos. You can try using headers and meta tags to influence the cache behavior of the browser, but too often the suggestions won’t be properly honored, and it’s almost a foregone conclusion that service interfaces will ignore the headers. So, giving up the hope that we can successfully cache slowly changing resource content at the Web client—at least as a guarantee of performance and behavior under load—we have to move back a step. However, instead of moving it back to the Web server, for most resources we can use a content delivery network.
Thinking about the path back from the client, there’s an opportunity between the front-end Web servers and the client where a waypoint of sorts can be leveraged, especially across broad geographies, to put the content closer to the consumers. The content is not only cached at those points, but more important, it’s closer to the final consumers. The servers used for distribution are known collectively as a content delivery/distribution network. In the early days of the Internet explosion, the idea and implementations of distributed resource caching for the Web were fairly new, and companies such as Akami Technologies found a great opportunity in selling services to help Web sites scale. Fast-forward a decade and the strategy is more important than ever in a world where the Web brings us together while we remain physically apart. For Azure, Microsoft provides the Azure Content Delivery Network (CDN). Although a valid strategy for caching content and moving it closer to the consumer, the reality is that CDN is more typically used by Web sites that have one or both conditions of high-scale and large quantities, or sizes of resources, to serve. A good post about using the Azure CDN can be found on Steve Marx’s blog (bit.ly/fvapd7)—he works on the Azure team.
In most cases when deploying a Web site, it seems fairly obvious that the files need to be placed on the servers for the site. In Azure Web Role, the site contents get deployed in the package—so, check, I’m done. Wait, the latest images from marketing didn’t get pushed with the package; time to redeploy. Updating that content currently—realistically—means redeploying the package. Sure, it can be deployed to stage and switched, but that won’t be without delay or a possible hiccup for the user.
A straightforward way to provide an updatable front-end Web cache of content is to store most of the content in Azure Storage and point all of the URIs to the Azure Storage containers. However, for various reasons, it might be preferable to keep the content with the Web Roles. One way to ensure that the Web Role content can be refreshed or that new content can be added is to keep files in Azure Storage and move them to a Local Resource storage container on the Web Roles as needed. There are a couple of variations on that theme available, and I discussed one in a blog post from March 2010 (bit.ly/u08DkV).
In-Memory Caching
While the previous caching discussion really focused on the movement of file-based resources, I’ll focus next on all the data and dynamically rendered content of the site. I’ve done tons of performance testing and optimization focused on the performance of the site and the database behind it. Without exception, having a solid caching plan and implementation that covers output caching (rendered HTML that doesn’t have to be rendered again and can just be sent to the client) and data (usually cache-aside style) will get you very far in improving both scale and performance—assuming the database implementation isn’t inherently broken.
The heavy lifting in implementing a caching strategy within a site is in determining what gets cached and how frequently it’s refreshed versus what remains dynamically rendered on each request. Beyond the standard capabilities provided by the Microsoft .NET Framework for output cache and System.Web.Caching, Azure provides a distributed cache named Azure AppFabric Cache (AppFabric Cache).
Distributed Cache
A distributed cache helps solve several problems. For example, although caching is always recommended for site performance, using session state is typically contraindicated even though it provides a contextual cache. The reason is that getting session state requires that a client is tied to a server, which negatively affects scalability, or that it’s synchronized across the servers in a farm, which is generally acknowledged—for good reason—to have issues and limitations. The session-state problem is solved by using a capable and stable distributed cache to back it up. This allows the servers to have the data without continually reaching off the box to get it, and at the same time provides a mechanism to write to the data and have it seamlessly propagated across the cache clients. This gives the developer a rich contextual cache while maintaining the scale qualities of a Web farm.
The best news about AppFabric Cache is that you can use it without doing much more than changing some configuration settings when it comes to session state, and it has an easy-to-use API for programmatic use. Take a look at Karandeep Anand’s and Wade Wegner’s article in the April 2011 issue for some good details on using the cache (msdn.microsoft.com/magazine/gg983488).
Unfortunately, if you’re working with an existing site that directly calls System.Web.Caching in the code, weaving AppFabric Cache in will be a bit more work. There are two reasons for this:
- The difference in APIs (see Figure 1)
- The strategy of what to cache and where
Figure 1 Add Content by Cache API
Add to Cache AppFabric Cache | Add to Cache System.Web.Caching |
DataCacheFactory cacheFactory= new DataCacheFactory(configuration); DataCache appFabCache = cacheFactory.GetDefaultCache(); string value = "This string is to be cached locally."; appFabCache.Put("SharedCacheString", value); |
System.Web.Caching.Cache LocalCache = new System.Web.Caching.Cache(); string value = "This string is to be cached locally."; LocalCache.Insert("localCacheString", value); |
Figure 1 illustrates clearly that when you look at even the basic elements of the APIs, there’s definitely a difference. Creating a layer of indirection to broker the calls will help with the agility of the code in your application. Obviously, some work will be required to easily provide the ability to use the advanced features of the three cache types, but the benefits will outweigh the effort to implement the functionality needed.
Although the distributed cache does solve some generally difficult problems, it shouldn’t be used as the snake oil that cures all or it will likely have about the same efficacy as snake oil. First, depending on how things are balanced and the data that goes into the cache, it’s possible that more off-machine fetches will be required to get data into the local cache client, which would negatively affect performance. More important is the cost of deployment. As of this writing, the cost of 4GB of AppFabric shared cache is $325 per month. Although this isn’t a large amount of money by itself, and 4GB does seem like a good bit of cache space, on a high-traffic site, especially one backing session state with AppFabric Cache and a lot of rich targeted content, it would be easy to fill multiple caches of that size. Consider product catalogs that have price differences based on customer tiers or custom contract pricing.
Cache-Aside Indirection
Like many things in the technology industry—and I would guess many others—design is some mix of the ideal technical implementations modified by fiscal reality. Thus, even when you’re just using Windows Server 2008 R2 AppFabric Caching, there are reasons to still use the local caching provided by System.Web.Caching. At a first pass of indirection, I might have wrapped the calls to each of the caching libraries and provided a function for each, such as AddtoLocalCache(key, object) and AddtoSharedCache(key, object). However, that means each time a cache operation is needed, the developer makes a rather opaque and personal decision on where the caching should happen. Such logic breaks down quickly under maintenance and on larger teams and will inevitably lead to unforeseen errors because the developer could choose to add an object to an inappropriate cache or add to one cache and accidentally fetch from another. Thus a lot of extra data fetching would be needed because data won’t be in the cache or will be in the wrong cache when it’s fetched. This leads to scenarios such as noticing unexpectedly poor performance only to find on examination that the add operations were done in one cache and the get operations were inexplicably done in the other for no better reason than that the developer forgot or mistyped. Moreover, when planning a system properly, those data types (entities) will be identified ahead of time and with that definition should also be the ideas of where each entity is used, consistency requirements (especially across load-balanced servers) and how fresh it must be. So, it follows that decisions about where to cache (shared or not) and expiry could be made ahead of time and made part of the declaration.
As I mentioned previously, there should be a plan for caching. Too many times it’s haphazardly added to the end of a project, but it should be given the same weight of consideration and design as any other aspect of the application. This is especially important when dealing with the cloud, because decisions that aren’t well considered often lead to extra cost in addition to the app behavior deficits. When considering the types of data that should be cached, one option is to identify the entities (data types) involved and their lifecycle within the application and user session. Looking at it this way reveals quickly that it would be nice if the entity itself could just intelligently cache based on its type. Luckily, this is an easy task with some assistance from a custom Attribute.
I’m skipping the setup for either cache because the previously referenced material covers that well enough. For my caching library, I’ve simply created a static class with static methods for my sample. In other implementations, there are good reasons to do this with instance objects, but for the simplicity of this example, I’m making it static.
I’ll declare an enum to indicate location and class that inherits Attribute to implement my custom attribute, as shown in Figure 2.
Figure 2 Declaring an Enum to Implement a Custom Attribute
public enum CacheLocationEnum
{
None=0,
Local=1,
Shared=2
}
public class CacheLocation:Attribute
{
private CacheLocationEnum _location = CacheLocationEnum.None;
public CacheLocation(CacheLocationEnum location)
{
_location = location;
}
public CacheLocationEnum Location { get { return _location; } }
}
Passing the location in the constructor makes it easy to use in the code later, but I’ll also provide a read-only method to fetch the value because I’ll need this for a case statement. Within my CacheManager library, I’ve created a couple of private methods for adding to the two caches:
private static bool AddToLocalCache(string key, object newItem)
{...}
private static bool AddToSharedCache(string key, object newItem)
{...}
For a real implementation, I’ll likely need some other info (for example, cache name, dependencies, expiry and so on), but for now this will do. The main public function for adding content to the cache is a template method, making it easy for me to determine cache from the type, as shown in Figure 3.
Figure 3 Adding Content to the Cache
public static bool AddToCache<T> (string key, T newItem)
{
bool retval = false;
Type curType = newItem.GetType();
CacheLocation cacheLocationAttribute =
(CacheLocation) System.Attribute.GetCustomAttribute(typeof(T),
typeof(CacheLocation));
switch (cacheLocationAttribute.Location)
{
case CacheLocationEnum.None:
break;
case CacheLocationEnum.Local:
retval = AddToLocalCache(key, newItem);
break;
case CacheLocationEnum.Shared:
retval = AddToSharedCache(key, newItem);
break;
}
return retval;
}
I’ll simply use the passed-in type to get the custom attribute and ask for my custom attribute type via the GetCustomAttribute(type, type) method. Once I have that, it’s a simple call to the read-only property and a case statement and I’ve successfully routed the call to the appropriate cache provider. To ensure that it works properly, I need to adorn my class declarations appropriately:
[CacheLocation(CacheLocationEnum.Local)]
public class WebSiteData
{
public int IntegerValue { get; set; }
public string StringValue { get; set; }
}
[CacheLocation(CacheLocationEnum.Shared)]
public class WebSiteSharedData
{
public int IntegerValue { get; set; }
public string StringValue { get; set; }
]}
With all of my application infrastructure set up, it’s ready for me to consume within the application code. I crack open the default.aspx.cs file to create the sample calls and add code to create the types, assign some values and add them to the cache:
WebSiteData data = new WebSiteData();
data.IntegerValue = 10;
data.StringValue = "ten";
WebSiteSharedData sharedData = new WebSiteSharedData();
sharedData.IntegerValue = 50;
sharedData.StringValue = "fifty";
CachingLibrary.CacheManager.AddToCache<WebSiteData>("localData", data);
CachingLibrary.CacheManager.AddToCache<WebSiteSharedData>(
"sharedData", sharedData);
My names for types make it obvious where the data will be cached. However, I could change the type names and it would be less obvious with the caching controlled by inspection of the custom Attribute. Using this pattern will hide from the page developer the details of where the data gets cached and other details related to the cache item configuration. Thus, those decisions are left to the part of the team that’s creating the data dictionaries and prescribing the overall lifecycle of said data. Note the type being passed into the calls to AddToCache<t>(string, t). Implementing the rest of the methods for the CacheManager class (that is, GetFromCache) would take on the same pattern as used here for the AddToCache method.
Balancing Cost with Performance and Scale
Azure provides the necessary software infrastructure to help you with any aspect of your implementation, including caching and whether the caching is for resources such as those distributed via CDN or data that might be kept in the AppFabric Cache. The key to a great design and subsequently great implementation is to balance cost with performance and scale. One last note: If you’re working on a new application right now and are planning on building caching into it, go ahead and put that layer of indirection in now. It’s a little extra work, but as new features such as AppFabric Caching come online, this practice will make it easier to thoughtfully and effectively incorporate the new features into your application.
Joseph Fultz is a software architect at Hewlett-Packard Co., working as part of the HP.com Global IT group. Previously he was a software architect for Microsoft, working with its top-tier enterprise and ISV customers defining architecture and designing solutions.
Thanks to the following technical expert for reviewing this article: Wade Wegner