Physical Database Storage Design

Article
01/28/2010

Published: June 26, 2006 | Updated : February 12, 2007

SQL Server Technical Article

Writers: Kathy Lu, Lewis Bruck

Technical Reviewer: Robert Dorr, Paul Randal, Conor Cunningham, Wei Xiao, Don Vilen, Kangrong Yan, Peter Byrne

Project Editor:

Designer:

Published: June 2006

Updated: February 2007

Applies To: SQL Server 2005 SP1

Summary: This article provides a guide for physical storage design and gives recommendations and trade-offs for physical hardware design and file architecture.

Introduction

This physical storage design guide is written to help database architects and administrators configure Microsoft SQL Server 2005 systems for optimal I/O performance and space utilization. This article also emphasizes the new SQL Server 2005 features that are significant to these discussions.

This paper is separated into three sections. The first section focuses on the database file storage design with the main focus on the new SQL Server 2005 features. The second section describes the design considerations of the physical hardware of the server system. This includes topics about system components such as disks, interfaces, buses, and RAID levels, and how each of these hardware components measure up with respect to the design criteria. The third section reviews the different types of workloads and the I/O requirements for various sizes of applications.

Executive Summary

Physical Database Design Steps

Designing a physical database requires that you consider multiple aspects and make sure each design decision integrates well with each other. The following are the recommended ordered steps to take when you approach this task.

Characterize I/O workload of application. For more information, see the “Characterizing Application Workload” section later in this topic.
Determine reliability and performance requirements for the database system.
Determine hardware required to support the decisions made in steps 1 and 2.
Configure SQL Server 2005 to take advantage of the hardware in step 3.
Track performance as workload changes.

Recommendations

The following is the list of recommendations that are made throughout this paper. Detailed information is provided in later sections of this paper.

Always use Page Checksum to audit data integrity.
Consider using compression for read-only filegroups for higher storage efficiency.
Use NTFS for security and availability of many SQL Server 2005 features.
Use instant file initialization for performance optimization.
Use manual file growth database options.
Use partitioning (available in Enterprise Edition) for better database manageability.
Storage-align indexes with their respective base tables for easier and faster maintenance.
Storage-align commonly joined tables for faster joins and better maintenance.
Choose your RAID level carefully. For more information, see Table 1A in Appendix A later in this paper. For excellent performance and high reliability of both read and write data patterns, use RAID10. For read-only data patterns, use RAID5.
For optimized I/O parallelism, use 64 KB or 256 KB stripe size.
For future scalability and easy of maintenance, use volume mount points.
To increase bus bandwidth reliability, use multipathing software.
For small servers with less than three disks performing mostly sequential I/O, or servers with approximately eight disks performing random I/O, PCI is sufficient. However, PCI-X is recommended and can service a wider range of servers with varying workload size.
Directly attached I/O is recommended for small- to medium-sized servers.
SAN systems are recommended for larger servers.
NAS systems are not recommended. Use iSCSI instead.
For better recoverability, use a SCSI interface instead of SATA and IDE.
For larger server loads, use SCSI or SATA with TCQ support.
Store transaction logs separate from data files. Do not stripe on the same disk as the data files.
For large bandwidth demands on the I/O bus, use a different bus for the transaction log files.
The number of data files within a single filegroup should equal to the number of CPU cores.

Database File Storage Architecture

Before you decide on the hardware configurations, such as whether to use RAID5 or RAID10, how many spindles to use, or what interface to use, you should determine how the database file architecture will be configured.

This section describes the new SQL Server 2005 features that are intended to improve many aspects of database storage design.

New SQL Server 2005 Features

In SQL Server 2005, there are several new features that are geared towards improving the performance, reliability, availability, capacity, and manageability of database files. There are also a few features that SQL Server 2005 DBAs must know about that could affect physical design choices.

The new features in the spotlight are the following:

Page checksum and page-level restore combination
Read-only filegroups on compressed drives
Instant file initialization
Database snapshots
Row-level versioning
Partitioning

Page Checksum and Page-Level Restore

With the introduction of the page checksum feature, SQL Server 2005 can help increase data protection. The page checksum feature can detect disk I/O errors that are not reported by the operating system or the underlying hardware. Page checksums are created and verified for both the data and the log blocks when PAGE_VERIFY_CHECKSUM is enabled for the database. Page checksum is the preferred alternative to the previous TORN_PAGE_DETECTION option and is on by default. This option can be changed by using ALTER DATABASE ... SET PAGE_VERIFY.

A checksum of the page is calculated, verified, or both at these times:

Checksum is calculated when a page is written to disk from the buffer pool.
Checksum is calculated and verified when a page is read from disk into the buffer pool, provided that the page was previously written to disk with a checksum and the PAGE_VERIFY option is enabled.
Checksum is calculated and verified during BACKUP and RESTORE commands.
- During a BACKUP DATABASE operation, page checksums are verified and a backup checksum is created on the backup media when the backup CHECKSUM option is specified. The default behavior is NO_CHECKSUM.
- By default, during RESTORE DATABASE and RESTORE VERIFYONLY operations, checksums are verified when the backup checksum is present on the backup media. If the overall checksum of the backup stream is not present in the backup media, no verification is performed.
  - By default, if the CHECKSUM option is specified and the backup media contains the checksum, checksum verification is performed. If the CHECKSUM option is specified and no checksum is present on the backup media, then the restore operation fails and returns an error message that indicates the absence of the checksum.
  - If the NO_CHECKSUM option is specified, the checksum validation is disabled.

The checksum is stored in the page header that already exists, so no extra storage is required. Although there is a slight performance cost in the calculation of the checksum, we strongly recommend using page checksum verification at all times to help preserve data integrity.

When a page checksum error is detected, the server returns an 824 error:

SQL Server detected a logical consistency-based I/O error: <error type>. It occurred during a <operation> of page <pageid> in database ID <dbid> at offset <file offset> in file <fileid>. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

The DBA can restore the affected pages by using online page-level restore. This new feature is available in SQL Server 2005 Enterprise Edition. Online page-level restore gives the server better availability in two ways:

Restoring less data. This reduces the time required to copy and recover.
Restoring only a small part of the database. This leaves the majority of the database tonline for continued use.

For more information about page checksum and page-level restore, see “Detecting and Coping with Media Errors” in SQL Server Books Online.

Data Compression for Read-Only Filegroups

SQL Server 2005 offers support for read-only filegroups on compressed drives. This helps to improve capacity utilization, manageability, and I/O performance. Any secondary filegroup of a user database that is marked as read-only can be placed on compressed NTFS volumes. If the entire database is set to read-only, all filegroups can be placed on compressed volumes.

Compressed read-only filegroups prevent updates while saving disk space by compressing these files. From a manageability standpoint, when the filegroup is set to read-only and compressed, there is almost no maintenance required for this filegroup. You should still perform some DBCC CHECK activities on these volumes occasionally to detect possible media deterioration. Because the files in the filegroups are compressed and there is less data to read from the disk, I/O performance frequently improves. However, CPU performance might decrease because of the processing time that is required.

An internal performance test for compressed vs. uncompressed read-only file shows this I/O vs. CPU performance tradeoff:

The test was performed on a quad Xeon 700MHz processor machine that was running Windows Server 2003.
The test used a dynamically striped volume of five physical disks at 10,000 RPM.
The test performed a scan by using a simple SELECT COUNT (*) statement on a table of approximately 90,000 rows of data on 3.5GB read-only file.
The compression ratio was 8.0 to 1.

The observed counters and results of this test are shown in Table 1.

Table 1: Test Results for Scan on Compressed and Uncompressed Read-Only Filegroup

Test Scenario	Average CPU Processing Time	Average Physical Disk Bytes/sec	Average Elapsed Time	File Size
Uncompressed file	Spikes at 42% at the beginning. Remains below 16% the remainder of the time.	Maintains 14 MB/sec uniformly for all 5 physical disks	58 seconds	3.5 GB
Compressed file	Spikes at 40% at the beginning. Remains below 20% the remainder of the time.	Maintains 750 KB/sec uniformly for all 5 disks	105 seconds	.44 GB

Uncompressed file

Spikes at 42% at the beginning.

Remains below 16% the remainder of the time.

Maintains 14 MB/sec uniformly for all 5 physical disks

58 seconds

3.5 GB

Compressed file

Spikes at 40% at the beginning.

Remains below 20% the remainder of the time.

Maintains 750 KB/sec uniformly for all 5 disks

105 seconds

.44 GB

Note that the I/O vs. CPU performance tradeoff scenario might be different for different workloads and environments.

This feature is particularly useful in situations where large portions of the database contain historical information, are used only for analysis or forecasting, and there is limited disk space.

For more information about how to convert a filegroup to read-only/compressed is in “Read-Only Filegroups” in SQL Server Books Online.

For scalable shared database systems, data compression on read-only volumes is a practical option. For more information, see “Overview of Scalable Shared Databases” in SQL Server Books Online.

Instant File Initialization

SQL Server 2005 added support for the Windows instant file initialization feature. By default, when a file is created or grows larger, the pages in that file are initialized by writing zeros before the file gets used. This overwrites any existing data that remains on the disk. Instant initialization is only used for data files (not log files) and is enabled when the account running SQL Server has the Windows SE MANAGE VOLUME NAME privilege, which is available only on Microsoft Windows XP, Windows Server 2003 or later versions. This occurs in five scenarios:

During file creation
- CREATE DATABASE, including tempdb creation at server startup.
- RESTORE DATABASE
During file modification
- ALTER DATABASE...MODIFY FILE.
- Modifications that result in autogrow activity.

The zeroing process can affect performance, especially during modifications that trigger an autogrow. SQL Server 2005 offers the instant file initialization feature. This skips zeroing out of data pages. Instant file initialization reduces time during the creation of very large databases and tempdb.

The previous data on newly allocated space is never zeroed. The previous data gets overwritten with new data when the data page is actually allocated. Avoiding zeroing out applies to any application that is running with the Windows SE_MANAGE_VOLUME_NAME privilege, and is making the call to the SetFileValidData Windows API.

Note: By default, if SQL Server is running under the Network Service account, this permission is default OFF.

For more information, see “Database File Initialization” in SQL Server Books Online.

There are conditions where the old data can be located by an application designed to use the extended privilege. Any application taking advantage of the SE_MANAGE_VOLUME_NAME privilege can uncover old data on disk without requiring SQL Server.

SQL Server provides strong ACL security but it is always good practice to ensure these files have restrictive ACLs. The types of files and its respective ACLs are displayed in Table 2.

Table 2: Potentially Exposed-Unwritten-Data Scenarios and the Protecting DACLs

Scenario	ACLs
Database files (.mdf, .ndf) when SQL Server is shut down	The ACLs remains with the Administrator group and Service Account that is running SQL Server.
Detached database files	If the database auto_close option is turned off, the ACLs are set to the Windows account that performed the database detach. If the auto_close option is on, the ACL does not change and remains with the Service Account and the Local Administrator Account. Note that only sysadmin accounts can detach a database.
Backup database files	The backup database file ACLs are inherited from the ACLs of the parent folder that contains those files. This is set by the sysadmin, db_owner, or db_backupoperator that specifies the location of the backup file.

Note: SQL Server tracks database allocations and never exposes un-initialized portions of the data file to a user. It is also not possible to simply copy or access a SQL Server data file and obtain the un-initialized data. By default, the operating system exposes un-initialized data as all zeros during any read operation. To obtain the raw data, the process must have the proper security account, privileges, and access tokens.

We recommend using instant file initialization for performance reasons. In-house testing for creating and growing files shows a significant improvement in performance when instant file initialization is used. This test was performed on two different servers:

The smaller server, named “Server A” in Table 2, was a single processor computer that was running Windows Server 2003 and using direct-attached IDE to a Windows Basic volume. This volume consisted of one, 120 GB physical disk running at 7200 RPM using the Intel 82801EB Ultra ATA storage controller.
The larger server, named “Server B” in Table 2, was a quad-core computer running Windows Server 2003 and using Ultra320 SCSI attached to a Windows Dynamic volume. This volume consisted of five striped physical disks of 36.4 BG each running at 15,000 rpm.

Because Server A is a smaller computer, it ran tests using 100 MB files. Server B ran tests with 1 GB files. When Server B ran with 100 MB increments, the instant file initialization performance impact is diminished due to the disk cache store on the server, compensating for the time it takes to physically write the zeros. The performance results of this test are shown in Table 3.

Table 3: Performance Comparison of Instant Initialization vs. Regular Initialization

Server	Scenario	Instant Init (s)	Regular Init (s)	Difference (s)
Server A	Adding new 100 MB file	0.2	2.8	2.6
Server A	Modify file size to grow 100 MB	0.1	2.2	2.1
Server B	Adding new 1 GB file	0.1	15.2	15.1
Server B	Modify file size to grow 1 GB	0.1	18.5	18.4

Although instant file initialization provides a faster file growth, it should not be used as substitution for proper file planning. File autogrow allows for automatic file maintenance, but the default of 10% growth might not apply to all database scenarios. It is a good practice to fine-tune the autogrow size of files in accordance with the specific needs of that database. It is also a good practice to manually set the autogrow size for data files, and/or manually grow files in preparation for database growth to avoid adverse effects on performance. For more information about how to determine the size of autogrow and the size of files, see SQL Server 2000 Operations Guide: Capacity and Storage Management. This information is still relevant in SQL Server 2005 for file sizing and autogrow sizing.

Database Snapshots

By using the new database snapshot feature, SQL Server 2005 users can capture a persistent snapshot of a database at a point in time that can be accessed by multiple users. Database snapshots have many applications, including point-in-time database views for reporting purposes, viewing mirrored databases, and recovering from user errors. It is important for the DBA to understand the I/O requirements in terms of space and performance of this feature.

Storage Space: Snapshots take up disk space. However, compared to a full copy of the database, snapshots are generally more space efficient. It uses the sparse file technology provided only by the NTFS file system to maintain its efficiency.

There is one snapshot file for each data file in the source database. A page is copied to a snapshot file when it is updated in the source database. This mechanism is referred to as copy-on-write. The combinations of the pages in the snapshot and those that have not changed in the source database give a consistent view of the database at the time the snapshot was taken. Remember that if all pages in the source database are updated, the snapshot file can grow to a similar size as the source database file.

I/O Performance: I/O Performance is reduced when making a first-time update to a page on the source database because this leads to an extra write of the source page to the snapshot database. However, any subsequent changes to the copied page do not incur this extra write. Therefore, if a source database experiences only localized updates to certain pages, a snapshot database will have little effect on the I/O performance and storage efficiency. However, if there are many database snapshots of the same source database, a single modification to the source database can cause a ripple of several writes to each appropriate database snapshot.

Online DBCC CHECK: Database snapshots are created for online DBCC CHECK activity. The actual check is performed on the database snapshot to allow for concurrent access of the original database. This snapshot is created internally and stored on the same volume as the original database. The snapshot therefore requires physical disk space, possibly up to the same amount of space as the original table. Because the underlying database snapshot is created without user specification of physical location, and the volume might not have enough space, one of two things can happen:

ONLINE DBCC CHECK might revert to using DBCC CHECK WITH TABLOCK. This means that table locks will be taken on the original database instead of using the internal database snapshot. A short-term exclusive lock is also taken on the database. Running DBCC CHECK with TABLOCK reduces the concurrency that is available on the database during the DBCC CHECK activity.
DBCC CHECK might fail to run because it cannot create the internal database snapshot and cannot obtain the exclusive database lock.

Creating the internal database snapshot might cause problems if the volume is already full, or if the volume becomes full by creating the internal database snapshot. To avoid this, manually create a database snapshot of the source database, and then run DBCC CHECK on the snapshot. Manually creating the database snapshot lets the administrator create the snapshot in a volume with ample disk space. DBCC CHECK will run directly on the manually created snapshot, and will not create another internal database snapshot.

Row-Level Versioning

SQL Server 2005 includes a new technology called row-level versioning to support better concurrent access. This technology is integrated in the following new features:

Snapshot Isolation
Multiple Active Result Sets (MARS)
Online Index Operations

Note Using row-level versioning requires storage space on tempdb and affects the I/O performance of tempdb I/O performance during both read and update. This is a result of the extra I/O work that is required to maintain and retrieve old versions. For more information on tempdb I/O performance, see the white paper .

When a row in a table or index is updated, the old row is copied to the version store (which exists in tempdb), and then is stamped with the transaction sequence number of that transaction performing the update. The new record holds a pointer to the old record in the version store. The old record might have a pointer to an even older record, which could point to a still older record, forming a version chain.

Versions can be created in the following scenarios:

In Snapshot Isolation for all updates and deletes.
When a trigger accesses modified rows (update/delete). This excludes instead-of triggers.
When a MARS session modifies a row (update/delete/insert) during pending reads.
For any online index build operations where there are concurrent DML operations on the same object.

Fore more information, see SQL Server 2005 beta 2 Snapshot Isolation, Database Concurrency and Row Level Versioning in SQL Server 2005, and “Understanding Row Versioning-Based Isolation Levels” in SQL Server 2005 Books Online.

Data Partitioning

The partitioning feature that is available on SQL Server 2005 Enterprise Edition might have the greatest influence on the physical database design. Partitioning horizontally divides the data (either tables or indexes) into smaller segments, or partitions, which can be maintained and accessed independently from each other. This improves manageability for very large or growing tables compared to the older method of partitioned views in SQL Server 2000.

Partitioning requires partitions to be assigned to filegroups, so many of the maintenance and performance gains of filegroups can be taken advantage of with a partition scheme. For more information about maintenance and performance gains, see Storage Engine Capacity Planning Tips.

Manageability

Partitioning in SQL Server 2005 replaces the older Partitioned View feature in SQL Server 2000. It offers faster query compilation, explicit errors, partition elimination during query processing, and uniform indexing on a single table. The main advantage is its manageability. By splitting large tables into smaller chunks, each chunk can be configured and managed differently according to each partition’s I/O needs. However, the single table entity is still maintained for easy administration. Here are a few of the ways data partitioning makes database management easier while still providing availability of the table during partition maintenance operations:

Partitions can be merged and split easily when configured in a sliding-window scenario. When the partition is in a sliding window configuration, data can be managed easier by splitting, merging and switching the time-sliced partitions. For more information, see Partitioned Tables and Indexes in SQL Server 2005.
Data movement in and out of partitioned tables can be easily performed with little loss of availability by using the ALTER TABLE ... SWITCH statement. There are three ways data chunks can be moved:
- Switching a partition from one partitioned table to another
- Switching a table into a partition as part of a partitioned table
- Switching a partition out of a table to become its own separate table.
These three methods give flexibility in managing partitioned tables in ways that allow manipulation of data while keeping the target table available. Because there is no physical data movement, only metadata change, the partitions and tables involved in the switching are required to be homogeneous. They must have the same columns of the same data type, name, order, and collation on the same filegroup. There are additional constraints that align the tables. For more information, see the Alignment and Storage Alignment section later in this paper.

To determine the specifics of partition switching, see “Transferring Data Efficiently by Using Partition Switching” in SQL Server Books Online.
Backup and restore can be done on a partition granularity basis if the partitions reside on different filegroups. Alternatively, the database can be restored by using the new piecemeal restore feature. Piecemeal restore allows restoring of the backed up database in stages. For more information, see “Performing Piecemeal Restores” in SQL Server Books Online.

Alignment and Storage Alignment

When two objects are partitioned in the same way, they are described as being aligned. The two objects do not necessarily have to use the same partitioning function to be aligned. The two partition functions require the following:

The same data type for the arguments of the partition functions
The same number of partitions
The same boundary values for the partitions

When two objects are aligned as described above and also have the same partitioning scheme, they are considered storage aligned. Again, the two objects do not need to share the same partitioning scheme but must reside on the same filegroups in the same partitioned order. For more information, see Partitioned Tables and Indexes in SQL Server 2005.

Storage alignment can be further specified as physical or logical. Physical storage alignment occurs when the filegroups holding the partitions reside on the same specified physical disk drives. Logical storage alignment occurs when the filegroups reside on the same logical disk drive which can actually consist of multiple physical disks.

These two concepts are important because storage alignment can provide performance improvements for queries, and for the merge, split, and switch partitioning management features.

We recommend that the partitioned non-clustered index be logically storage-aligned to the base heap table or clustered index. There are two benefits to this design:

During index creation, it eases the I/O and memory load because sort tables are created one at a time instead of all at once. For non-aligned indexes the sort tables get created all at once. This can also affect any queries that require SQL Server to perform data sorting. For more information, see “Special Guidelines for Partitioned Indexes” in SQL Server Books Online.
During index or table maintenance, it allows partitions to be switched in and out of the table quickly while preserving the partition structure of both the table and index. If indexes are not also aligned, switching does not work.

In general, alignment and storage alignment are recommended for both the base table and any indexes. However, there are specific situations where a partitioned index might not benefit from being aligned with the base table:

The base table itself is not partitioned
The index key is not associated with the partitioning column of the base table.
The table is frequently involved in collocated joins with other tables by using different join columns. Collocated joins are joins between two aligned objects where the join predicate is on the partitioned table. The non-clustered indexes can be partitioned on different columns to participate in more collocated joins with other tables. For more information, see Partitioned Tables and Indexes in SQL Server 2005.

Aligned Partitioning and Disk Configurations

In the following example, there are two tables, T1 and T2. Each table has five partitions and 25 available physical disks to be grouped in any logical volume configuration. The examples show four configurations for these two tables.

1. Simplest Configuration: Single Filegroup for Partitioned Tables

Figure 1 shows the five partitions of both tables using the same filegroup, FG1. The two tables are aligned; however, all the partitions reside on one filegroup. The filegroup resides on the logical volume U:\, which holds all 20 physical disks. The major limitation to this configuration is that this layout does not leverage data separation advantages because each partitioned data chunk resides in the same filegroup. For example, if a certain partition requires read-only access, this cannot be set on the filegroup

Although this configuration has many limitations, it is useful in a test environment, where the functionality is most important and further I/O tuning of the database is unnecessary.

Figure 1: Single Filegroup for Partitioned Tables

Figure 1: Single Filegroup for Partitioned Tables

2. Segregated Configuration: Partitioned

Figure 2 shows that T1 and T2 are-storage aligned with each of their five partitions assigned to separate filegroups. Partition 1 of T1 and T2 are both on FG1, partition 2 of T1 and T2 are both on FG2, and so on. Each of the logical volumes: I:\, J:\, K:\, L:\, and M:\ have four physical disks each. The advantages for this configuration include the following:

Better separation of the data since each partition is in its own filegroup. This allows for specialized configuration of each partition.
Better I/O parallelism at the table level because the partitions are split into separate filegroups, and filegroups reside on different disks. However, if the five logical disks are striped with four underlying physical disks, the degree of I/O parallelism at the partition level is limited to a maximum of four disks.

Figure 2: Segregated Disks

3. RAID10 Configuration

Figure 3 shows T1 and T2 as storage-aligned with all five partitions of both tables in separate filegroups. This layout is very similar to the previous design, except that all the filegroups exist on one logical disk consisting of all 20 physical disks with RAID10.

Note RAID10 and other RAID levels are described in Appendix A.

In this scenario, the I/O parallelism can be used to its fullest by all partitions. Therefore, distribution of I/O workload is among 20 physical spindles instead of four at the partition level.

Figure 3: Aligned Partitioned Tables on RAID10

Figure 3: Aligned Partitioned Tables on RAID10

4. Specialized RAID System Configuration: Figure 4 has a layout that is very similar to the RAID10 configuration in Figure 3 and has the same benefits. The only difference is that the partitions on FG4 and FG5 are read-only, compressed, and on another logical disk with RAID5. This configuration allows the heavily accessed read-write data to be on a high performance, highly reliable I/O system, while the read-only data is put on a less expensive RAID system that has sufficient performance for read-only data. This design allows for finer tuning of the needs of certain data to the appropriate I/O system.

Figure 4: Specialized RAID System Configuration

Figure 4: Specialized RAID System Configuration

Physical Server System Design

This section describes the design choices to consider when building a database system. Information about disk and volume configurations, the available choices for system components, and how each component measures up to certain design criteria are included in this section.

The Considerations for Disk and Volume Configurations

This subsection describes the various disk and volume configurations to consider when building a database system.

Sector Size vs. Stripe Size

The sector size is the smallest physical storage unit on the disk. The sector size is a fixed size set by the disk manufacturer, currently at 512 bytes. The stripe size refers to the unit of data that is written and accessed from a disk in a RAID system. This is a configurable value that is set when designing the storage array system. A smaller stripe size allows data to be distributed to more disks and increase I/O parallelism. Note that the stripe size of a single SQL Server extent (64 KB) is the lower limit. For the same data, a larger stripe size means the data can be stored on fewer disks and decrease the I/O distribution and the degree of parallelism. We recommend a 64 KB or 256 KB stripe size for most workloads. When the workload includes table and index range scans on tables that are larger than 100 MB, a stripe size of 256 KB allows for more efficient read-ahead.

Physical Disks vs. Windows Basic Volumes

Physical Disks are the actual hardware disk drive units that are presented to Windows. The partitions and logical drives on these physical disks are described as Windows volumes. Existing Windows volumes can be extended to a larger size by extending the volume onto unallocated space on the same physical disk or to a different disk. When the volume is extended across multiple disks, it is described as a spanned volume. There are two types of Windows Volumes: basic and dynamic. Dynamic volumes provide more features, such as software RAID implementations, than basic volumes. To understand more about Windows volumes, see Basic Disks and Volumes and Dynamic Disks and Volumes.

NTFS vs. FAT vs. Raw Partitions

NTFS and FAT are both file systems offered by the Windows operating system. FAT is the older technology with more limited file and security features than NTFS. The NTFS file system has many unique built-in features that SQL Server 2005 features use. Features that require the NTFS file system include, but are not limited to, the following:

Database snapshots (which use NTFS sparse files)
Online DBCC check functions (which use NTFS sparse files and file streams)
Compressed files
Instant file initialization
Mount points
File ACLs and other security features

Because NTFS is the newer technology, it is not available with older operating systems such as Windows 98 or DOS mode for dual booting scenarios. However, this is a small restriction because dual booting is not common for server scenarios. and SQL Server 2005 is not supported on the older operating system versions. FAT has a file limit of 4 gigabytes in size; in contrast, NTFS has a very large file limit of 16 Exabyte.

A disk partition with no file system or unformatted is referred to as a raw partition. In some scenarios, raw partitions might give a slight performance increase compared to having a file system. However, Microsoft discourages raw partitioning because raw partitioning limits the common data access and recovery options that NTFS provides.

Volume Mount Points vs. Lettered Drives

Volume mount points or mounted drives are volumes that are attached to folder names on NTFS volumes, and that are assigned a label or volume name instead of a drive letter. This feature gives mounted points two advantages:

The mount points are not limited to 26 letters of the alphabet as lettered drives are.
The mounted drives are more protected against system changes that occur when adding or removing devices to the computer system.

Server Components and Design Criteria

This section describes various hardware components, and how the design criteria affect the choices of those components. For more information about the hardware components, see Appendix A.

Design Criteria

When considering the available design choices, there are some common criteria you should use to decide on the appropriate database design. There will be some trade-offs to evaluate during the design process, so it is a good idea to prioritize the design criteria. The following are a few design criteria considered in this paper:

Reliability/availability
Performance
Capacity or scalability
Manageability
Cost

Bus Bandwidth

Reliability/Availability

Larger bandwidth does not give significant increase in reliability for smaller systems. However, for medium to larger-sized servers, larger bus bandwidth does improve on the system’s reliability especially with added multi-pathing software. The bus bandwidth’s reliability is improved through the redundant paths in the system and by avoiding single-point-of failure in hardware devices. There are many multi-pathing software solutions on the market, including the Microsoft MPIO driver.

Performance

Larger bus bandwidth is absolutely necessary for improved performance. This is especially true for systems that frequently use large block transfers and sequential I/O. A larger bus bandwidth is also necessary to support a large number of disks.

Also keep in mind that the disk is not the only user of bus bandwidth. For example, you must also account for network access.

In smaller servers that use mostly sequential I/O, PCI becomes a bottleneck with three disks. For a small server that has about eight disks performing mostly random I/O, PCI is sufficient. However, it is more common for PCI-X to be found on servers ranging from small to very large. PCI-E is now commonly found on newer desktops and might soon be more widely accepted on small servers.

Capacity

The capacity of bus bandwidth might be limited by the topology of the system. If the system uses direct attached disks, the number of slots limits the bus bandwidth capacity. However, for SAN or NAS systems, there is no physical limiting factor.