Share via


Enterprise intranet collaboration environment lab study (SharePoint Server 2010)

 

Applies to: SharePoint Server 2010

This article contains guidance on performance and capacity planning for an enterprise intranet collaboration solution that is based on Microsoft SharePoint Server 2010. It includes the following:

  • Lab environment specifications, such as hardware, farm topology and configuration

  • Test farm dataset

  • Test results analysis which should help you determine the hardware, topology and configuration that you must have to deploy a similar environment, and optimize your environment for appropriate capacity and performance characteristics

In this article:

  • Introduction to this environment

  • Glossary

  • Overview

  • Specifications

  • Results and Analysis

Introduction to this environment

This document provides guidance about scaling out and scaling up servers in a SharePoint Server 2010 enterprise intranet collaboration solution, based on a testing environment at Microsoft. Capacity planning informs decisions on purchasing hardware and making system configurations to optimize your solution.

Different scenarios have different requirements. Therefore, it is important to supplement this guidance with additional testing on your own hardware and in your own environment. If your planned design and workload resembles the environment described in this document, you can use this document to draw conclusions about scaling your environment up and out.

This document includes the following:

  • Specifications, which include hardware, topology, and configuration

  • The workload, which is the demand on the farm, includes the number of users, and the usage characteristics

  • The dataset, such as database sizes

  • Test results and analysis for scaling out Web servers

  • Test results and analysis for scaling up Web servers

  • Test results and analysis for scaling out database servers

  • Comparison between Microsoft Office SharePoint Server 2007 and SharePoint Server 2010 about throughput and effect on the web and database servers

The SharePoint Server 2010 environment described in this document is a lab environment that mimics a production environment at a large company. The production environment hosts very important team sites and publishing portals for internal teams for enterprise collaboration, organizations, teams, and projects. Employees use that production environment to track projects, collaborate on documents, and share information within their organization. The environment includes a large amount of small sites used for ad-hoc projects and small teams. For details about the production environment, see Enterprise intranet collaboration environment technical case study (SharePoint Server 2010).

Before reading this document, make sure that you understand the key concepts behind capacity management in SharePoint Server 2010. The following documentation will help you learn about the recommended approach to capacity management and provide context for helping you understand how to make effective use of the information in this document, and also define the terms used throughout this document.

Also, we encourage you to read the following:

Glossary

There are some specialized terms that you will encounter in this document. Here are some key terms and their definitions.

  • RPS: Requests per second. The number of requests received by a farm or server in one second. This is a common measurement of server and farm load.

    Note that requests differ from page loads; each page contains several components, each of which creates one or more requests when the page is loaded. Therefore, one page load creates several requests. Typically, authentication checks and events using insignificant resources are not counted in RPS measurements.

  • Green Zone: This is the state at which the server can maintain the following set of criteria:

    • The server-side latency for at least 75% of the requests is less than 1 second.

    • All servers have a CPU Utilization of less than 50%.

    Note

    Because this lab environment did not have an active search crawl running, the database server was kept at 40% CPU Utilization or lower, to reserve 10% for the search crawl load. This assumes Microsoft SQL Server Resource Governor is used in production to limit Search crawl load to 10% CPU.

    • Failure rate is less than 0.01%.
  • Red Zone (Max): This is the state at which the server can maintain the following set of criteria:

    • HTTP request throttling feature is enabled, but no 503 errors (Server Busy) are returned.

    • Failure rate is less than 0. 1%.

    • The server-side latency is less than 3 seconds for at least 75% of the requests.

    • Database server CPU utilization is less than 80%, which allows for 10% to be reserved for the Search crawl load, limited by using SQL Server Resource Governor.

  • AxBxC (Graph notation): This is the number of Web servers, application servers, and database servers respectively in a farm. For example, 8x1x2 means that this environment has 8 Web servers, 1 application server, and 2 database servers.

  • MDF and LDF: SQL Server physical files. For more information, see Files and Filegroups Architecture.

Overview

This section provides an overview to our scaling approach, to the relationship between this lab environment and a similar case study environment, and to our test methodology.

Scaling approach

This section describes the specific order that we recommend for scaling servers in your environment, and is the same approach we took for scaling this lab environment. This approach will enable you to find the best configuration for your workload, and can be described as follows:

  • First, we scaled out the Web servers. These were scaled out as far as possible under the tested workload, until the database server became the bottleneck and was unable to accommodate any more requests from the Web servers.

  • Second, we scaled out the database server by moving half of the content databases to another database server. At this point, the Web servers were not creating sufficient load on the database servers. Therefore, they were scaled out additionally.

  • In order to test scale up, we tried another option which is scaling up the Web servers instead of scaling them out. Scaling out the Web servers is generally preferred over scaling them up because scaling out provides better redundancy and availability.

Correlating the lab environment with a production environment

The lab environment outlined in this document is a smaller scale model of a production environment at Microsoft, and although there are significant differences between the two environments, it can be useful to examine them side by side because both are enterprise collaboration environments where the patterns observed should be similar.

The lab environment contains a subset of the data from the production environment, and also some modifications to the workload. This influences the test results with regard to Web server memory usage, because the object cache on the production environment receives a larger amount of hits on unique sites, and therefore uses more memory. The lab environment also has less data, and most of it is cached in memory as opposed to the production environment which carries over seven terabytes of data, so that the database server on the production environment is required to perform more disk reads than the database server in the lab environment. Similarly, the hardware that was used in the lab environment is significantly different from the production environment it models, because there is less demand on those resources. The lab environment relies on more easily available hardware.

To get a better understanding of the differences between the environments, read the Specifications section in this document, and compare it to the specifications in the Enterprise intranet collaboration environment technical case study (SharePoint Server 2010).

Methodology and Test Notes

This document provides results from a test lab environment. Because this was a lab environment and not a production environment, we were able to control certain factors to show specific aspects of performance for this workload. In addition, certain elements of the production environment, listed here, were left out of the lab environment to simplify testing overhead. We do not recommend omitting these elements for production environments.

  • Between test runs, we modified only one variable at a time, to make it easy to compare results between test runs.

  • The database servers that were used in this lab environment were not part of a cluster because redundancy was not necessary for the purposes of these tests.

  • Search crawl was not running during the tests, whereas it might be running in a production environment. To take this into account, we lowered the SQL Server CPU utilization in our definition of ‘Green Zone’ and ‘Max’ to accommodate the resources that a search crawl would have consumed if it were running at the same time with our tests. To learn more about this, read Storage and SQL Server capacity planning and configuration (SharePoint Server 2010).

Specifications

This section provides detailed information about the hardware, software, topology, and configuration of the lab environment.

Hardware

The following sections describe the hardware that was used in this lab environment.

Web and Application servers

There are from one to eight Web servers in the farm, plus one Application server.

Web Server WFE1-8 and APP1

Processor(s)

2 quad-core 2.33 GHz processors

RAM

8 GB

Operating system

Windows 2008 Server R2

Size of the SharePoint drive

80 GB

Number of network adapters

2

Network adapter speed

1 Gigabit

Authentication

Windows NTLM

Load balancer type

Windows NLB

Services running locally

WFE 1-8: Basic Federated Services. This included the following: Timer Service, Admin Service, and Trace Service. APP1: Word Automation Services, Excel Services and SandBoxed Code Services.

Database Servers

There are from two to three database servers, up to two running the default SQL Server instance housing the content databases, and one running the logging database. The logging database is not tracked in this document.

Note

If you enable usage reporting, we recommend that you store the logging database on a separate Logical Unit Number (LUN). For large deployments and some medium deployments, a separate LUN will not be sufficient, as the demand on the server’s CPU may be too high. In that case, you will need a separate database server box for the logging database. In this lab environment, the logging database was stored in a separate instance of SQL Server, and its specifications are not included in this document.

Database Server – Default Instance DB1-2

Processor(s)

4 dual-core 3.19 GHz processors

RAM

32 GB

Operating system

Windows 2008 Server R2

Storage and geometry

Direct Attached Storage (DAS)

Internal Array with 5 x 300GB 10krpm disk

External Array with 15 x 450GB 15krpm disk

6 x Content Data (External RAID0, 2 spindles 450GB each)

2 x Content Logs (Internal RAID0, 1 spindle 300GB each)

1 x Temp Data (Internal RAID0, 2 spindles 150GB each)

1 x Temp Log (Internal RAID0, 2 spindles 150GB each)

2 x Backup drive (Internal RAID0, 1 spindle each, 300GB each)

Number of network adapters

1

Network adapter speed

1 Gigabit

Authentication

Windows NTLM

Software version

SQL Server 2008 R2 (pre-release version)

Topology

The following diagram shows the topology in this lab environment:

Farm topology diagram for this lab environment

Configuration

To allow for the optimal performance, the following configuration changes were made in this lab environment.

Setting Value Notes

Site Collection

   

Blob Caching

On

The default is Off. Enabling Blob Caching improves server efficiency by reducing calls to the database server for static page resources that may be frequently requested.

Database Server – Default Instance

   

Max degree of parallelism

1

The default is 0. To ensure optimal performance, we strongly recommend that you set max degree of parallelism to 1 for database servers that host SharePoint Server databases. For more information about how to set max degree of parallelism, see max degree of parallelism Option(https://go.microsoft.com/fwlink/p/?LinkId=189030).

Workload

The transactional mix for the lab environment described in this document resembles the workload characteristics of a production environment at Microsoft. For more information about the production environment, see Enterprise intranet collaboration environment technical case study (SharePoint Server 2010).

Here are the details of the mix for the lab tests run against SharePoint Server 2010 compared to the production environment. Although there are some minor differences in the workloads, both represent a typical transactional mix on an enterprise collaboration environment.

Chart showing workload for test environment

Dataset

The dataset for the lab environment described in this document is a subset of the dataset from a production environment at Microsoft. For more information about the production environment, see Enterprise intranet collaboration environment technical case study (SharePoint Server 2010).

Dataset Characteristics Value

Database size (combined)

130 GB

BLOB size

108.3 GB

Number of content databases

2

Number of site collections

181

Number of Web applications

1

Number of sites

1384

Results and Analysis

The following results are ordered based on the scaling approach described in the Overview section of this document.

Web Server Scale Out

This section describes the test results that were obtained when we scaled out the number of Web servers in this lab environment.

Test methodology

  • Add Web servers of the same hardware specifications, keeping the rest of the farm the same.

  • Measure RPS, latency, and resource utilization.

Analysis

In our testing, we found the following:

  • The environment scaled up to four Web servers per database server. However, the increase in throughput was non-linear especially on addition of the fourth Web server.

  • After four Web servers, there are no additional gains to be made in throughput by adding more Web servers because the bottleneck at this point was the database server CPU Utilization.

  • The average latency was almost constant throughout the whole test, unaffected by the number of Web servers and throughput.

Note

The conclusions described in this section are hardware specific, and the same throughput might have been achieved by a larger number of lower-end hardware, or a smaller number of higher-end hardware. Similarly, changing the hardware of the database server would affect the results. To get an idea on how much of a difference the hardware of the Web servers can affect these results, see the Web Server Scale Up section.

Results graphs and charts

In the following graphs, the x axis shows the change in the number of Web servers in the farm, scaling from one Web server (1x1x1) to five Web servers (5x1x1).

1. Latency and RPS

The following graph shows how scaling out (adding Web servers) affects latency and RPS.

Chart with RPS and Latency across WFE scale out

2. Processor utilization

The following graph shows how scaling out the Web servers affects processor utilization on the Web server(s) and the database server.

Chart with processor utlization at WFE scale out

3. SQL Server I/O operations per section (IOPs) for MDF and LDF files

The following graphs show how the IOPs on the content databases change as the number of Web servers is scaled out. These are measured by looking at the following performance counters:

  • PhysicalDisk: Disk Reads / sec

  • PhysicalDisk: Disk Writes / sec

In this lab environment, we determined that our data on IOPs was not representative of a production environment because our dataset was so small that we could fit much more of it in cache than would be possible in the production environment we are modeling. We calculated projected reads by multiplying the value of the data we had from the lab for writes/second by the ratio of reads to writes in our production environment. The results in this section are averages. But there are also spikes that occur during certain operations which have to be accounted for. To learn more about how to estimate IOPs needed, see Storage and SQL Server capacity planning and configuration (SharePoint Server 2010).

Maximum:

Chart with IOPs at Web server scale out maximum

Green Zone:

Chart with IOPs at Web server scale out greenzone

Example of how to read these graphs:

An organization with a workload similar to that described in this document that expects 300 RPS to be their green zone, could use 3x1x1 topology, and would use approximately 600 Physical Disk reads/sec on the MDF file.

Database Server Scale Out

This section describes the test results that were obtained when we scaled out the number of database servers in this lab environment.

Test methodology

  • Have two content databases on one database server, and then split them to two servers to effectively double the processor cores and memory available to the database servers in the environment.

  • Keep the total IOPs capacity constant even while adding a database server. This means that the number of reads/sec and writes/sec that the disks could perform for each content database did not change despite splitting the content onto two database servers instead of one.

Analysis

  • The first bottleneck in the 4x1x2 environment was the database server CPU utilization. There was close to a linear scale when we added more processor and memory power.

  • Scaling to four Web servers and 2 database servers did not provide much additional RPS because the CPU utilization on the Web servers was close to 100%.

  • When we scaled out database servers (by adding one additional database server) and added four Web servers, performance scaled almost linearly. The bottleneck at that point shifted from being the database server CPU utilization to the content database disk IOPs.

  • No additional tests were performed in this lab environment to scale out past 8x1x2. However we expect that additional IOPs capacity would additionally increase throughput.

  • A correlation between the IOPs used and the RPS achieved by the tests was observed

Results graphs and charts

In the following graphs, the x axis is always showing four Web servers together with 1 application server and 1 database server (4x1x1) scaling out to eight Web servers together with two database servers (8x1x2). Some also show 1x1x1 or 4x1x2.

1. Latency and RPS

The following graph shows how scaling out both Web servers and database servers affects latency and RPS.

Chart with RPS and Latency at database scale

2. Processor utilization

The following graphs show how scaling out affects processor utilization.

Chart with processor utlization at database scale

3. Memory utilization at scale out

Throughout our testing, we’ve observed that the larger the number of site collections in an environment, the more the memory consumed. For example, in the tests here where 181 site collections were accessed, the main w3wp process used up 1.8 GB of RAM. For more examples, see the Performance and capacity technical case studies (SharePoint Server 2010). Additional content about memory requirements for increased numbers of site collections is under development. Check back for new and updated content.

4. SQL Server I/O operations per section (IOPs) for MDF and LDF files

The following graphs show how the IOPs change as both the number of Web servers and the number of database servers is scaled out.

Maximum RPS

Chart with IOPs at database scale out maximum

Green Zone RPS

Chart with IOPs at database scale out greenzone

Web server Scale Up

This section describes the test results that were obtained when we scaled up the Web servers in this lab environment.

Test methodology

  • Add more Web server processors, but keep the rest of the farm the same.

Analysis

  • Scale is linear up to eight processor cores.

  • Tests show that the environment can take advantage of a twenty-four core box, although there is some degradation as twenty-four cores are approached.

Results graphs and charts

In the following graph, the x axis is the number of processors and the amount of RAM on the Web server. The following graph shows how scaling up (adding processors) affects RPS on the Web server.

Chart with RPS at scale up

Comparing SharePoint Server 2010 and Office SharePoint Server 2007

This section provides information about how the capacity testing for this workload varied between SharePoint Server 2010 and Microsoft Office SharePoint Server 2007.

Workload

To compare SharePoint Server 2010 with Office SharePoint Server 2007, a different test mix was used than the one outlined in the Specifications section, because some SharePoint Server 2010 operations were not available in Office SharePoint Server 2007. The test mix for Office SharePoint Server 2007 is inspired by the same production environment that the SharePoint Server 2010 tests follow. However this was recorded before the upgrade to SharePoint Server 2010 on that environment.

The following graph shows the test mix for the lab and production environments for Office SharePoint Server 2007.

Chart with transaction mixes for environments

Test methodology

  • The tests performed for this comparison were performed by creating an Office SharePoint Server 2007 environment, testing it with the workload outlined earlier in this section, and then upgrading the content databases to SharePoint Server 2010 without changing the clients using the environment, nor doing a visual upgrade. This upgraded environment was then re-tested for the SharePoint Server 2010 results with the same test mix which includes only Office SharePoint Server 2007 operations.

  • The dataset was not modified after the content database upgrade for the SharePoint Server 2010 tests.

  • The test mix for Office SharePoint Server 2007 excludes any new SharePoint Server 2010 specific operations, and resembles the enterprise intranet collaboration solution on the same production environment for Office SharePoint Server 2007, as described under the Workload section.

Analysis

  • When the same number of Web servers are stressed to their maximum throughput on SharePoint Server 2010 and Office SharePoint Server 2007, SharePoint Server 2010 achieves 20% less throughput compared to Office SharePoint Server 2007.

  • When the Web servers were scaled out to maximize the database server usage, SharePoint Server 2010 was able to achieve 25% better throughput compared to Office SharePoint Server 2007. This reflects the improvements that were made in SharePoint Server 2010 to sustain larger deployments.

  • When the web servers were scaled out to maximize the database server usage, SharePoint Server 2010 was SQL Server CPU Utilization bound, whereas Office SharePoint Server 2007 was Lock bound on the database tier. This means that increasing the processing power available to the database servers enables SharePoint Server 2010 to achieve better throughput than would be possible with the same hardware using Office SharePoint Server 2007. This is caused by the locking mechanisms in the database in Office SharePoint Server 2007 which are unaffected by improved hardware so that we were unable to push the database server’s CPU Utilization past 80%.

  • As a result of these findings outlined earlier in this section, on Office SharePoint Server 2007 the maximum throughput possible was achieved in a 5x0x1 topology whereas in SharePoint Server 2010 the maximum throughput possible with the same workload was achieved in a 7x0x1 topology, and yielded a 25% increased total RPS.

Results graphs and charts

The following graph shows the throughput without scaling out Web servers.

Chart with throughput before scaling out

The following graph shows the throughput when Web servers were at maximum scale out.

Chart with throughput at maximum Web server scale

See Also

Other Resources

Resource Center: Capacity Management for SharePoint Server 2010