Share via

Caching Guidance

Data ManagementPerformance & ScalabilityCloud Guidance and PrimersShow All

Caching is a common technique that aims to improve the performance and scalability of a system by temporarily copying frequently accessed data to fast storage located close to the application. Caching is most effective when an application instance repeatedly reads the same data, especially if the original data store is slow relative to the speed of the cache, is subject to a high level of contention, or is far away when network latency can cause access to be slow.

Caching in Cloud Applications

There are two main types of cache commonly used by cloud applications:

  • An in-memory cache, where data is held locally on the computer running an instance of an application.
  • A shared cache, which can be accessed by several instances of an application running on different computers.

In-Memory Caching

The most basic type of cache is an in-memory store, held in the address space of a single process and accessed directly by the code that runs in that process. This type of cache is very quick to access, and it can provide an extremely effective strategy for storing modest amounts of static data (the size of a cache is typically constrained by the volume of memory available on the machine hosting the process). If you have multiple instances of an application that uses this model running concurrently, each application instance will have its own independent cache holding its own copy of data.

You should think of a cache as a snapshot of the original data at some point in the past. If this data is not static, it is likely that different application instances will hold different versions of the data in their caches. Therefore, the same query performed by these instances could return different results, as shown in Figure 1.

Figure 1 - Using an in-memory cache in different instances of an application

Figure 1 - Using an in-memory cache in different instances of an application

Shared Caching

Using a shared cache can help to alleviate the concern that data may differ in each cache, as can occur with in-memory caching. Shared caching ensures that different application instances see the same view of cached data by locating the cache in a separate location, typically hosted as part of a separate service, as shown in Figure 2.

Figure 2 - Using a shared cache

Figure 2 - Using a shared cache

An important benefit of using the shared caching approach is the scalability it can provide. Many shared cache services are implemented by using a cluster of servers, and utilize software that distributes the data across the cluster in a transparent manner. An application instance simply sends a request to the cache service, and the underlying infrastructure is responsible for determining the location of the cached data in the cluster. You can easily scale the cache by adding more servers.

The disadvantages of the shared caching approach are that the cache is slower to access because it is no longer held in the memory of each application instance, and the requirement to implement a separate cache service may add complexity to the solution.

Considerations for Using Caching

Caching is ideally suited to data that has a high proportion of reads compared to writes. The following sections describe in more detail the considerations for designing and using a cache.

Types of Data and Cache Population Strategies

The key to using a cache effectively lies in determining the most appropriate data to cache, and caching it at the appropriate time. The data may be added to the cache on demand the first time it is retrieved by an application, so that the application needs fetch the data only once from the data store and subsequent accesses can be satisfied by using the cache.

Alternatively, a cache may be partially or fully populated with data in advance, typically when the application starts (an approach known as seeding). However, it may not be advisable to implement seeding for a large cache as this approach can impose a sudden, high load on the original data store when the application starts running.

Often an analysis of usage patterns can help to decide whether to fully or partially prepopulate a cache, and to choose the data that should be cached. For example, it would probably be useful to seed the cache with the static user profile data for customers who use the application regularly (perhaps every day), but not for customers who use the application only once a week.

Caching typically works well with data that is immutable or that changes infrequently. Examples include reference information such as product and pricing information in an ecommerce application, or shared static resources that are costly to construct. Some or all of this data can be loaded into the cache at application startup to minimize demand on resources and to improve performance. It may also be appropriate to have a background process that periodically updates reference data in the cache to ensure it is up to date, or refreshes the cache when reference data changes.

Caching may be less useful for dynamic data. When the original data regularly changes, either the cached information can become stale very quickly or the overhead of keeping the cache synchronized with the original data store reduces the effectiveness of caching.

Performance testing and usage analysis should be carried out to determine whether prepopulation or on-demand loading of the cache, or a combination of both, is appropriate. The decision should be based on a combination of the volatility and usage pattern of the data. Cache utilization and performance analysis is particularly important in applications that encounter heavy loads and must be highly scalable. For example, in highly scalable scenarios it may make sense to seed the cache to reduce the load on the data store at peak times.

Caching can also be used to avoid repeating computations as the application is running. If an operation transforms data or performs a complicated calculation, it can save the results of the operation in the cache. If the same calculation is required subsequently, the application can simply retrieve the results from the cache.

An application can modify data held in a cache, but you should consider the cache as a transient data store that could disappear at any time. Do not store valuable data only in the cache, but make sure that you maintain the information in the original data store as well. In this way, if the cache should become unavailable, you minimize the chance of losing data.

Using Read-Through and Write-Through Caching

Some commercial caching solutions implement read-through and write-through caching whereby an application always reads and writes data by using the cache. When an application fetches data, the underlying cache service determines whether the data is currently held in the cache, and if not the cache service retrieves the data from the original data store and adds it to the cache before returning the data to the application. Subsequent read requests should find the data in the cache.

Read-through caching effectively caches data on demand. Data that an application does not use will not be cached. When an application modifies data, it writes the changes to the cache. The cache service transparently makes the same change to the original data store.


Write-through caches typically write changes to the data store synchronously, at the same time as the cache is updated. Some caching solutions implement the write-behind strategy whereby the write to the data store is postponed until the data is about to be removed from cache. This strategy can reduce the number of write operations performed and improve performance at the risk of the increased inconsistency between the data store and the cache that may arise as a result.

For systems that do not provide read-through and write-through caching, it is the responsibility of the applications that use the cache to maintain the data in the cache. The most straightforward approach to implement read-through caching is to implement the Cache-Aside pattern. You can use this strategy to implement an abstraction layer in your application code that emulates a read-through and write-through cache.

In some scenarios, caching data that experiences high volatility without immediately persisting changes to the original data store can be advantageous. For example, an application can modify the data in cache, and if the application expects the data to be changed again very quickly it can refrain from updating the original data store until the system becomes inactive, and then save the data in the original data store only as it appears in this inactive state. In this way, the application can avoid performing a number of slow, expensive write operations to the data store and the data store experiences less contention. However, do not use this strategy if the application cannot safely reconstruct its state if the cache is lost, or if the system requires a full audit trail of every change made to the data.

Managing Data Expiration in a Cache

In most cases, data held in a cache is a copy of the data held in the original data store. It is possible that the data in the original data store might change after it was cached, causing the cached data to become stale. Many caching systems enable you to configure the cache to expire data and reduce the period for which data may be out of date.

When cached data expires it is removed from the cache, and the application must retrieve the data from the original data store (it can put the newly-fetched information back into cache). You can set a default expiration policy when you configure the cache. In many cache services you can also stipulate the expiration period for individual objects when you store them programmatically in the cache. This setting overrides any cache-wide expiration policy, but only for the specified objects.


Consider the expiration period for the cache and the objects that it contains carefully. If you make it too short, objects will expire too quickly and you will reduce the benefits of using the cache. If you make the period too long, you risk the data becoming stale.

It is also possible that the cache might fill up if data is allowed to remain resident for a long time. In this case, any requests to add new items to the cache might cause some items to be forcibly removed, in a process known as eviction. Cache services typically evict data on a least-recently-used (LRU) basis, but you can usually override this policy and prevent items from being evicted. However, if you adopt this approach you risk your cache exceeding the memory that it has available, and an application that attempts to add an item to the cache will fail with an exception.

Some caching implementations may provide additional eviction policies. These typically include the most-recently-used policy (in the expectation that the data will not be required again) and first-in-first-out policy (oldest data is evicted first).

Managing Concurrency in a Cache

Caches are often designed to be shared by multiple instances of an application. Each application instance can read and modify data in the cache. Consequently, the same concurrency issues that arise with any shared data store are also applicable to a cache. In a situation where an application needs to modify data held in the cache, you may need to ensure that updates made by one instance of the application do not blindly overwrite the changes made by another instance. Depending on the nature of the data and the likelihood of collisions, you can adopt one of two approaches to concurrency:

  • Optimistic. The application checks to see whether the data in the cache has changed since it was retrieved, immediately prior to updating it. If the data is still the same, the change can be made. Otherwise, the application has to decide whether to update it (the business logic that drives this decision will be application-specific). This approach is suitable for situations where updates are infrequent, or where collisions are unlikely to occur.
  • Pessimistic. The application locks the data in the cache when it retrieves it to prevent another instance from changing the data. This process ensures that collisions cannot occur, but could block other instances that need to process the same data. Pessimistic concurrency can affect the scalability of the solution and should be used only for short-lived operations. This approach may be appropriate for situations where collisions are more likely, especially if an application updates multiple items in the cache and must ensure that these changes are applied consistently.

Implementing High Availability and Security

Some cache services provide a high-availability option that implements automatic failover if part of the cache becomes unavailable. Additionally, irrespective of the cache service you use, you should consider how to protect the data held in the cache from unauthorized access.


Determining whether to implement caching, deciding which data to cache, estimating the size of the cache, and planning the most appropriate caching topology to use, is a complex and application-specific task. The topic Capacity Planning for Microsoft Azure Cache Service on MSDN provides some detailed guidance and tools that you can use to determine a cost-effective strategy for caching data using Azure Cache.

Related Patterns and Guidance

The following pattern may also be relevant to your scenario when implementing caching in your applications:

  • Cache-Aside Pattern. This pattern describes how to load data on-demand into a cache from a data store. This pattern also helps to maintain consistency between data held in the cache and the data in the original data store.

More Information

Next Topic | Previous Topic | Home | Community

patterns & practices Developer Center