Techniques for Aggregating List and Site Information

Aggregating list items is one of the most common tasks that a SharePoint application performs. The list items can be distributed across many sites and site collections. This means that aggregating data can be a costly operation for systems with complex topologies. Consequently, it frequently causes performance issues.

Although SharePoint offers several approaches that minimize the impact of aggregating data, developers and architects should perform this task judiciously. Always consider the amount of information that is returned by a query. Instead of returning many rows and then using an Extensible Stylesheet Language Transformation (XSLT) transformation to filter out all but a few of them, consider optimizing the query. For more information about designing information architecture, see Determine the information architecture of your site and Information architecture in Office SharePoint Server, on TechNet.

After you decide that you must aggregate information, determine which aggregation technique is appropriate. You should consider the following points:

  • Aggregation scope. Do you need to aggregate information from a single site or from multiple sites? Will the query traverse sites within a site hierarchy or cross-site hierarchies? Does the query span site collections?
  • Information scope. What is the type of information being queried? Does the query apply only to list items, or does it involve other objects like aggregating information stored in property bags? Do you need to derive information from the relevant objects?
  • Performance characteristics. How much data will be retrieved? How much flexibility do you need in manipulating the retrieved data? For detailed information about list access performance, see Working with Large Lists in Office SharePoint Server 2007.
  • Data staleness. Does the retrieved data need to be real time or is the possibility of the data being stale acceptable?

Within site collections, SharePoint can use SQL to query the site collection database. However, if you want to aggregate information across site collections, you can no longer use SQL. This means that you must implement the in-memory data joins yourself. In most circumstances, a better approach is to use SharePoint Enterprise search and accept some data latency.

The following table characterizes the different aggregation techniques.

Aggregation technique

Characteristics

Appropriate use

Content Query Web Part

Aggregation scope: site collection.

Information scope: aggregates data along the site hierarchy; only works with list data; works with multiple content types; does not provide direct access to SharePoint objects (see Portal site map provider).

Performance: optimized for cross-list queries. Uses the object cache.

Accuracy: Can control if results are cached. If data is cached, then may be stale.

This Web Part is only for querying list data.

The SharePoint user interface only allows XSLT transformations. It is only available with Microsoft Office SharePoint Server.

For an example of how to use this Web Part, see the Partner Portal application. The ActiveIncidentTasksWebPart is a Content Query Web Part that aggregates all of a partner's open incident tasks and displays them on the incident dashboard. Used for no-code solutions.


Portal site map provider

Aggregation scope: site collection.

Information scope: aggregates data along the site hierarchy; works with multiple content types; works with sites, Webs and lists; provides direct access to SharePoint objects for data manipulation, such as calculating values.

Performance: optimized for queries. Uses the object cache.

Accuracy: Cached result sets may be stale.

It is more complex than the Content Query Web Part.

It is only available with Microsoft Office SharePoint Server.

For an example, see the Partner Portal application. The IncidentStatusListWebPart in the Contoso.PartnerPortal.Collaboration.Incident project uses a portal site map provider object in its presenter class (IncidentStatusListPresenter.cs) to aggregate all of a partner's incidents and to display them on the incident dashboard. Used for code based solutions.

Direct cross-site collection query

Aggregation scope: across a server farm.

Information scope: provides no data aggregation; works with multiple content types; works with sites, Webs and lists.

Performance: Not optimized or cached; should only be used for targeted queries.

Accuracy: no latency.

It cannot traverse a site hierarchy.

It is best suited for targeted, cross-farm queries.

For an example, see the Partner Portal application. The PartnerPromotionsWebPart that is on the partner home page uses a direct cross-site collection query to retrieve all the promotions that are intended for the currently logged-in user. To perform the query, it calls the partner promotion repository's GetAllMyPromos method.

Search

Aggregation scope: works across the server farm.

Information scope: aggregates along and across arbitrary information hierarchies and content types; does not provide direct access to objects.

Performance: Can query large data sets across sites and the server farm; can perform complex query and document searches; efficient for aggregating different types of data or data that is spread across a number of site collections.

Accuracy: Data is as accurate as the last crawl.

The FullTextSqlQuery class is only available with the Microsoft Office SharePoint Server object model. Security information is collected at the time of the crawl. This requires specialized logic for non-native search data sources.

It is best suited for aggregating information across site collections or a server farm.

Data latency must be acceptable.

See the Partner Portal application for an example. The PartnerRollupWebPart in the Contoso.PartnerPortal.PartnerCentral project uses the search API to aggregate all open tasks for all partners. It displays them on the Partner Central site for administrative purposes.

Home page on MSDN | Community site