Share via


Data Classification (Velocity)

[This topic is pre-release documentation and is subject to change in future releases. Blank topics are included as placeholders.]

By selecting the appropriate types of data to cache in your application, you can benefit the most from Microsoft project code named "Velocity." Data can take many forms and reside in different tiers of your application. Distributed caching makes diverse types of data easy to store and retrieve despite service boundary limits and differences in semantics.

Most applications use a single source for any data instance. For example, data stored in an application's primary database requires a high degree of data consistency and integrity and steps are taken to make sure that each piece of data is unique. Data in the middle tier and operated by business logic is usually a copy of source data and may be organized with other pieces of data in order to be useful in the presentation tier. It is these middle-tier copies that are appropriate for caching.

Understanding the various types of data helps define the degrees of caching that are possible using "Velocity." As seen in the following table, there are three types of data that are appropriate for distributed caching: reference, activity, and resource.

Data Type Access Pattern

Reference

Shared read

Activity

Exclusive write

Resource

Shared, concurrently read and written into, accessed by a large number of transactions

Reference Data

Reference data is a version of source data that changes infrequently. It is either a direct copy of the original data or it is aggregated and transformed from multiple data sources. Reference data is refreshed periodically, usually at configured intervals or when data changes.

Because reference data does not change frequently, it is an ideal candidate for caching. Instead of using computing resources to re-aggregate and transform resource data each time it is requested, resource data can be saved to the cache and reused for subsequent requests. Caching reference data across multiple applications or users in this way can help increase application scale and performance.

Examples of reference data include flight schedules and catalogs. For example, consider a catalog application that aggregates product information across multiple application and data sources. The most common operation on the catalog data is a shared read: browsing. A catalog browse operation iterates over lots of product data, filters it, personalizes it, and then presents the selected data to a large number of users.

Because browse operations can require lots of resources, this kind of catalog data is ideal for caching. If not cached, these operations can unnecessarily tax the data source and can significantly affect the response time and throughput of the application.

Caching the data closer to the application can significantly improve performance and the scalability. For this reason, "Velocity" offers the local cache feature. For more information see Cache Clients and Local Cache (Velocity).

Activity Data

Activity data is generated as part of a business transaction by an executing activity. The data originates as part of the business transaction. Then, at the close of the business transaction, the data is retired to the data source as historical or log information.

Examples of activity data include purchase orders, application session states, or an online shopping cart. Consider the shopping cart data in an online buying application. Each shopping cart is exclusive per each online buying session and is its own individual data collection. During the buying session, the shopping cart is cached and updated with selected products. The shopping cart is visible and available only to the buying transaction. Upon checkout, as soon as the payment is applied, the shopping cart is retired from the cache to a data source application for additional processing. Once the business transaction is processed by the data source application, the shopping cart information is logged for auditing and historical purposes.

While the buying session is active, the shopping cart is accessed both for read and write activities but is not shared. The exclusive access to the activity data makes it appropriate for distributed caching.

Scale requirements for a distributed cache storing activity data are that it must be able to handle lots of individual data collections and also support the operations that affect these collections. To support large scalability of the application, these data collections must be distributed across the cache cluster.

Because the data collections are not shared, the individual collections of data can be distributed across the distributed cache and stored on separate cache hosts. By dynamically growing the distributed cache with additional cache hosts, the application can scale to meet increasing demand.

With "Velocity," you can create a region for each of your individual data collections. Regions offer a rich set of tag-based operations for working with your data collections. For more information, see Tag-Based Methods (Velocity).

"Velocity" also lets you manage session state for ASP.NET Web applications. For more information, see How to: Configure a Session State Provider (XML) (Velocity).

Resource Data

Both reference (shared read) and activity (exclusive write) data are ideal for caching. But not all application data falls into these two categories. There is also data that is shared, concurrently read and written into, and accessed by lots of transactions. Such data is known as resource data.

Examples of resource data include user accounts and auction items. For example, consider an auction item. The auction item includes the description of the item and the current bidding information (such as the current bid, who bid, and so forth). The bidding information is volatile, unique to each bid, and concurrently accessed by a large number of users for read and write operations. The business logic is cached close to the resource data

For tracking purposes, resource data is usually stored in online transaction processing (OLTP) data sources but is cached in the application tier to improve performance and free computing resources for the data source(s). In the auction example, caching the bid data on a single computer can provide some performance improvements but for large-scale auctions, a single cache cannot provide the required scale or availability. For this purpose, some types of data can be partitioned and replicated in multiple caches across the distributed cache. However, because certain types of data are shared and concurrently updated, cache consistency across the cluster must be preserved.

To optimize scalability, spread your resource data out as much as you can and limit the use of regions. If you do use regions, put your data in several regions to allow the data to be distributed across the cache cluster.

"Velocity" supports both optimistic and pessimistic concurrency operations. For more information, see Concurrency Models (Velocity).

See Also

Concepts

General Concept Models (Velocity)
Concurrency Models (Velocity)
Expiration and Eviction (Velocity)
High Availability (Velocity)

Other Resources

Programming Guide (Velocity)
Administration Guide (Velocity)