Capacity requirements for the Web Analytics Shared Service in SharePoint Server 2010
Applies to: SharePoint Server 2010
Initial capacity testing was performed for a simulated midsized deployment of Microsoft SharePoint Server 2010 and other applications that included 30,000 SharePoint entities. This article describes the results of the capacity testing activities and contains guidance on capacity management for the Web Analytics service application in SharePoint Server 2010.
In SharePoint Server 2010, the Web Analytics service application enables you to collect, report, and analyze the usage and effectiveness of SharePoint Server 2010 sites. Web Analytics features include reporting, Web Analytics workflow, and Web Analytics Web Part. For more information, see Reporting and usage analysis overview (SharePoint Server 2010).
The aspects of capacity planning that are described in this article include the following:
Description of the architecture and topology.
Capacity planning guidelines based on the key factors such as total expected traffic and number of SharePoint components.
Description of the other factors that affect the performance and capacity requirements.
Before you continue to read this article, make sure that you understand key concepts related to SharePoint Server 2010 capacity management. The resources that are listed in this section can help you learn about frequently used terms and get an overview of the recommended approach to capacity management. These resources can also help you use the information that is provided in this article more effectively.
For more conceptual information about performance and capacity management, see the following articles:
In this article:
Introduction
Hardware specifications and topology
Capacity requirements
Introduction
Overview
As part of SharePoint Server 2010, the Web Analytics service application is a set of features that you can use to collect, report, and analyze the usage and effectiveness of a SharePoint Server 2010 deployment. You can organize SharePoint Web Analytics reports into three main categories:
Traffic
Search
Inventory
SharePoint Web Analytics reports are typically aggregated for various SharePoint entities, such as sites, site collections, and Web applications for each farm. To view an architectural overview of the Web Analytics service application in a SharePoint deployment, see Architectural overview later in this article.
The Web Analytics shared service requires resources primarily at the application server and database server level. This article does not cover the Web Server layer capacity planning, because the Web Analytics service’s capacity requirements are minimal at this level.
This article contains the capacity requirements for several application servers and Microsoft SQL Server based computers, according to the following criteria:
Total expected site traffic (clicks, search queries, ratings).
Number of SharePoint components (Site, Site Collection, and Web Application) for each farm.
Other less significant factors which can affect the capacity requirements are summarized in Other factors later in this article.
Architectural overview
The following diagram (Figure 1) shows the flow of the site usage data from a Web browser to the analytics databases, and then back to the Web browser as reports. The usage data is logged to the usage files on the Web servers. The usage timer job calls the Logging Web Service to submit the raw data from the usage files. The Logging Web Service writes it to the staging database, where the raw data is stored for seven days (this is not configurable). The Web Analytics components Log Batcher and User Behavior Analyzer clean and process the raw data on the staging database. The Report Consolidator runs one time every 24 hours. The Report Consolidator aggregates the raw data from the staging database on various dimensions, and then writes it to the reporting database. The aggregated data is stored in the reporting database for a default period of 25 months (this is configurable).
Figure 1. SharePoint Server 2010 Web Analytics architectural overview
The performance of the Logging Web Service primarily depends on the number of application servers. (Scaling out is available for the application servers.) The performance of the Log Batcher and User Behavior Analyzer depends primarily on the analytics staging database. The Read and Write activities that are performed by all the different components can cause the analytics staging database to slow down the process. (Scaling out is available for the staging database.) The performance of the Report Consolidator also primarily depends on the reporting database. (Scaling out of reporting database is not supported.)
Hardware specifications and topology
This section provides detailed information about the hardware, software, topology, and configuration of a case study environment.
Hardware
Note
This environment is scaled to accommodate initial builds of SharePoint Server 2010 and other products. Therefore, the deployed hardware has larger capacity than necessary to serve the demand typically experienced by this environment. This hardware is described only to provide additional context for this environment and serve as a starting point for similar environments. It is important to conduct your own capacity management based on your planned workload and usage characteristics. For more information about the capacity management process, see Performance and capacity management (SharePoint Server 2010).
Web servers
This article does not cover the Web server layer capacity planning, because the Web Analytic service’s capacity requirements are minimal at this level.
Application servers
The following table describes the configuration of each application server. Based on the site traffic and the number of SharePoint components that are involved, users will need one or more application servers.
Application server | Minimum requirement |
---|---|
Processors |
4 quad core @ 2.33 GHz |
RAM |
8 GB |
Operating system |
Windows Server 2008, 64 bit |
Size of the SharePoint drive |
300 GB |
Number of network adapters |
1 |
Network adapter speed |
1 GB |
Authentication |
NTLM |
Load balancer type |
SharePoint Load Balancer |
Software version |
SharePoint Server 2010 |
Services running locally |
|
Database servers
Instances of SQL Server were required for both the staging and reporting databases. The following table describes the configuration of each database server.
Database server | Minimum requirement |
---|---|
Processors |
4 quad core @ 2.4 GHz |
RAM |
32 GB |
Operating system |
Windows Server 2008, 64-bit |
Disk size |
3 terabytes Note Although we used this disk size for our capacity testing, your environment will likely require a much larger disk size to support Web Analytics. |
Number of network adapters |
1 |
Network adapter speed |
1 GB |
Authentication |
NTLM |
Software version |
SQL Server 2008 |
Note
We used the configuration that is described in the previous table for our capacity testing. Your environment will likely require fast, enterprise-class storage to support Web Analytics. For example, you will want to use a multi-disk RAID array or a similar disk configuration to increase Input/Output Operations per Second (IOPS) for daily incremental synchronization. In addition, we recommend that you spread the data load for SQL Server across multiple disk spindles.
For more information about best practices for configuring SQL Server, see the following resources:
-
Storage Top 10 Best Practices (https://go.microsoft.com/?linkid=9763249)
-
Disk Partition Alignment Best Practices for SQL Server (https://go.microsoft.com/?linkid=9763104)
-
Analyzing I/O Characteristics and Sizing Storage Systems for SQL Server Database Applications (https://go.microsoft.com/?linkid=9763105)
Topology
The following diagram (Figure 2) shows the Web Analytics topology.
Figure 2. Web Analytics topology
Capacity requirements
Testing methodology
This section presents the capacity requirements with regard to the total amount of site traffic (this is measured by number of clicks, search queries, and ratings) per day that can be supported by different numbers of application servers and SQL Server based computers. The numbers presented currently are for a midsize SharePoint deployment that has about 30,000 SharePoint entities. The Web Analytics shared service aggregates the data for each day. Therefore, the data volume that is presented corresponds to the total number of records (this is measured by number of clicks, search queries, and ratings) that the SharePoint farm is expected to receive each day.
This section provides diagrams that show the daily site traffic that can be supported by one, two, or three application servers (Figure 3) and the daily site traffic that can be supported that corresponds to the various database configurations (Figure 4). In the diagrams, data is shown by using two colors:
Green Green values indicate the safe limit for the site traffic that can be processed for the corresponding number of application servers and SQL Server based computers.
Yellow Yellow values indicate the expected limit for the site traffic that can be processed for the corresponding number of application servers and SQL Server based computers.
The green and yellow values are estimates that are based on two key factors:
Total site traffic, measured by number of page view clicks, search queries, and ratings.
Number of SharePoint entities, such as sites, site collections, and Web applications, for each farm.
The estimates also depend on other properties of the data and the data retention period in the reporting database. For testing, the other properties of the data were maintained as constant as described in Dataset description later in this section.
Also, in smaller SharePoint deployment environments, you can share the application servers and SQL Server based computers together with other SharePoint services and databases. This article contains information about the capacity of the application servers and the SQL Server based computers that are in a test environment so that the Web Analytics shared service is the only major service that is running on the servers. The actual performance results for environments that actively use other shared services at the same time running might vary.
To determine the capacity requirements for your environment, make sure that you estimate the expected daily site traffic and the number of components that you might use for a SharePoint deployment. Then, the number of application servers and SQL Server based computers should be estimated independently, as shown in Figure 3 and Figure 4.
Dataset description
The dataset that was selected for the test environment has approximately 30,000 SharePoint components, which includes all web applications, site collections, and sites. Other characteristics of the data that were kept constant in the environment are also listed in the following table.
Dataset characteristics | Value |
---|---|
Number of SharePoint components |
30,000 |
Number of unique users |
117,000 |
Number of unique queries |
68,000 |
Number of unique assets |
500,000 |
Data size in the reporting database |
200 GB |
The total site traffic, measured by number of clicks, search queries, and ratings, was increased as part of this case study to establish the number of records that can be supported by the corresponding topology.
Important
Some typically used topologies generally exceed the capacity planning guidance. Those topologies include the following:
-
Team sites
-
My Site Web sites
-
Self-provisioning portals
Application servers
The following diagram (Figure 3) shows the daily site traffic that can be supported by one, two, or three application servers. The site traffic is represented in millions of records (each click, search query, or rating makes up a record) each day. The yellow line represents the expected number of records for the corresponding topology, whereas the green line represents the safe assumption for the number of records.
Figure 3. Daily site traffic vs. the application servers topology
The application servers are not very CPU-intensive or memory intensive. Thus, the CPU and the memory usage are not summarized for this section.
SQL Server based computers
The following diagram (Figure 4) shows the daily site traffic that can be supported that corresponds to the following configurations:
One instance of SQL Server for both staging and reporting databases (1S+R).
Two instances of SQL Server, one staging database and one reporting database (1S1R).
Three instances of SQL Server, two staging databases and one reporting database (2S1R).
The site traffic is represented in millions of records (each click, search, or rating makes up a record) each day. The yellow line represents the expected number of records for the corresponding topology, whereas the green line represents the safe assumption for the number of records.
Figure 4. Daily site traffic vs. SQL Server topology
The following table summarizes the CPU and memory usage of the various components on the instances of SQL Server that are hosting the staging database and the reporting database.
Configuration | 1S+R | 1S1R | 1S1R | 2S1R | 2S1R |
---|---|---|---|---|---|
Staging + Reporting |
Staging |
Reporting |
Staging |
Reporting |
|
Total sum of percentage of processor time for 8 processor computer |
19 |
192 |
5.78 |
100 |
13.4 |
SQL Server buffer hit ratio |
99 |
100 |
100 |
100 |
100 |
% Disk time |
7,142 |
535 |
5.28 |
59.3 |
98.2 |
Disk queue length |
357 |
28.6 |
0.26 |
2.97 |
4.91 |
Other factors
Many other factors can affect the performance of various analytics components and can affect the capacity planning. These factors primarily affect the performance of the Report Extractor component because they can affect the size of the data aggregated each day. The total size of the data in the reporting database also affects the performance of the Reporting Extractor, although this is not significant because the data is partitioned daily. Some of these other factors are as follows:
Number of unique queries each day.
Number of unique users each day.
Total number of unique assets clicked each day.
Existing data size in the reporting warehouse, based on the data retention in the warehouse.
The overall effect of these factors is less significant than the total data volume and the number of site entities. However, it is important to conduct your own capacity management based on your planned workload and usage characteristics. For more information about the capacity management process, see Performance and capacity management (SharePoint Server 2010).
See Also
Concepts
Performance and capacity management (SharePoint Server 2010)
SharePoint 2010 Administration Toolkit (SharePoint Server 2010)