Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Applies to:
SQL Server
Azure SQL Managed Instance
Attribute | Value |
---|---|
Product Name | SQL Server |
Event ID | 833 |
Event Source | MSSQLSERVER |
Component | SQLEngine |
Symbolic Name | BUF_LONG_IO |
Message Text | SQL Server has encountered %d occurrence(s) of I/O requests taking longer than %d seconds to complete on file [%ls] in database [%ls] (%d) . The OS file handle is 0x%p. The offset of the latest long I/O is: %#016I64x. |
This message indicates that SQL Server has issued a read or write request from disk, and that the request has taken longer than 15 seconds to return. The SQL Server reports this error and indicates a problem with the I/O subsystem. A database management system (DBMS), such as SQL Server, relies on the timeliness of file input and output (I/O) operations. Any one of the following items may cause stuck or stalled I/O operations and adversely affect SQL Server responsiveness and performance:
These I/O problems may cause the following behavior to occur:
When an I/O operation has been pending for 15 seconds or longer, SQL Server performs the following steps:
Detects that an operation has been pending.
Writes an informational message to the SQL Server error log as outlined in the Details section.
Explanation to different sections of this informational message is given in the following table:
Message text | Description |
---|---|
<Number> occurrence(s) | The number of I/O requests that didn't complete the read or write operation in less than 15 seconds. |
File information | The complete file name, database name, and database identification (DBID) number. |
Handle | The operating system handle of the file. You can use the operating system handle with debuggers or other utilities to help track I/O request packet (IRP) requests. |
Offset | The offset of the last stuck I/O operation or the last stalled I/O operation. You can use the offset with debuggers or other utilities to help track IRP requests. Note: When the informational message is written to the SQL Server error log, the I/O operation may no longer be stuck or stalled. |
The informational message indicates that the current load may be experiencing one of the following conditions:
SQL Server records the time it initiated an I/O request and records the time the I/O was completed. If that difference is 15 seconds or longer, this condition is detected. It also means that SQL Server isn't the cause of the delayed I/O condition that this message describes and reports. This condition is known as stalled I/O. Most disk requests occur within the typical speed of the disk. This typical disk speed is frequently known as disk seek time. Disk seek time for most standard disks occurs in 10 milliseconds or less. Therefore, 15 seconds is a long time for the system I/O path to return to SQL Server. For more details, see the More Information section.
Troubleshoot this error by performing the following steps:
For a guided walk-through for diagnosing and troubleshooting SQL Server performance issues that occur because of I/O issues, see Troubleshoot slow SQL Server performance caused by I/O issues.
Stuck I/O
Stuck I/O is defined as an I/O request that doesn't complete. Frequently, stuck I/O indicates a stuck IRP. To resolve a stuck I/O condition, you must usually restart the computer or perform a similar action. A stuck I/O condition typically indicates one of the following issues:
Stalled I/O
Stalled I/O is defined as an I/O request that does complete, or that takes excessive time to complete. Stalled I/O behavior typically occurs because of one of the following reasons:
SQL Server Support handles many cases each year that involve stuck or stalled I/O problems. These I/O issues appear in different ways. I/O issues are some of the most difficult to diagnose and debug, and they require significant time and resources for debugging from Microsoft and the customer. The reporting and recording of I/O requests are designed on a per-file basis. The detection and reporting of stalled and stuck I/O requests are two separate actions.
Recording
There are two moments when a record action occurs in SQL Server. The first is when the I/O operation completes. The second moment is when the lazy writer runs. When the lazy writer runs, it checks all the pending data and pending log file I/O requests. If the I/O request exceeds the 15 seconds threshold, a record operation occurs.
Reporting
Reporting occurs in intervals that are five minutes or more apart. Reporting occurs when the next I/O request is made on the file. If a record action has occurred and five minutes or more have passed since the last report occurred, the informational message that is mentioned in the Details section is written to the SQL Server error log.
The 15 seconds threshold isn't adjustable. However, you can disable stalled or stuck I/O detection by using trace flag 830, although we don't recommend doing this.
You can disable detection for stalled and stuck I/O by using trace flag 830. To enable this flag every time that SQL Server is started, use the -T830 startup parameter. To disable detection for an instance of SQL Server that is currently running, use the following statement:
dbcc traceon(830, -1)
This setting is effective only for the life of the SQL Server process.
Note
An I/O request that becomes stalled or stuck is reported only one time. For example, if the message reports that 10 I/O requests are stalled, those 10 reports won't occur again. If the next message reports that 15 I/O requests are stalled, it means that 15 new I/O requests have become stalled.
SQL Server uses the standard Microsoft Windows API calls to read and write data. For example, SQL Server uses the following functions:
The read or write request is handled by Windows as an I/O request packet (IRP). To determine the state of the IRP, use both of the following features::
We recommend that you check for any available updates for the following items:
Contact your hardware vendors before you perform additional debugging actions. The debug session will likely involve a third-party driver, firmware, or filter driver component.
Overall, system performance can play a key role in I/O processing. You should consider the general health of the system while investigating reports of stalled or stuck I/O operations. Excessive loads can cause the overall system to be slow, including the I/O processing. The behavior of the system when the problem occurs can be a key factor in determining the root cause of the problem. For example, if the CPU usage increases or remains high while the problem occurs, it may indicate that a system process is using so much CPU that other processes are being adversely affected.
Performance counters
To monitor I/O performance, examine the following performance counters for specific I/O path information:
For example, the Average Disk Sec/Transfer time on a computer that's running SQL Server is typically less than 15 milliseconds. If the Average Disk Sec/Transfer value climbs, it indicates that the I/O subsystem isn't optimally keeping up with the I/O demand.
Be careful while using the performance counters because SQL Server takes full advantage of asynchronous I/O capabilities that heavily push the disk queue lengths. Therefore, longer disk queue lengths alone don't indicate a problem.
In Windows System Monitor, you can review the counter "Physical Disk: Disk Bytes/sec" for each affected disk and compare the rate of activity against the counters "Process: IO Data Bytes/Sec" and "Process: IO Other Bytes/sec" for each process. You do this to identify whether a specific set of processes is generating excessive I/O requests. Various other I/O-related counters in the Process object reveal more granular information. If you determine that a SQL Server instance is responsible for excessive I/O load on the server, see the next section on Indexes and Parallelism. For a detailed discussion on detecting and resolving I/O bottlenecks, see Troubleshoot slow SQL Server performance caused by I/O issues.
Indexes and parallelism
Frequently, bursts of I/O occur because an index is missing. This behavior can severely push the I/O path. A pass that uses the Index Turning Wizard (ITW) may help resolve I/O pressure on the system. If a query benefits from an index instead of a table scan, or perhaps if it uses a sort or hash, the system can gain the following advantages:
The following examples have been handled by SQL Server Support and Windows Escalation Support. These examples are intended to provide a frame of reference and help set your expectations about stalled and stuck I/O situations. They also provide a framework for understanding how a system may be affected or may respond. No specific hardware or set of drivers pose any specific risk or increased risk over another. All systems are the same in this respect.
Example 1: A log write that is stuck for 45 seconds
An attempt to write a SQL Server log file periodically gets stuck for approximately 45 seconds. The log write doesn't complete in a timely manner. This behavior creates a blocking condition that causes 30 seconds client time-outs.
The application submitted a commit to SQL Server, and the commit gets stuck as a log write pending. This behavior causes the query to continue holding locks and block incoming requests from other clients. Then, other clients start to time out. This compounds the problem because the application doesn't roll back open transactions when a query time-out occurs. This creates hundreds of open transactions that are holding locks. Therefore, a severe blocking situation occurs.
For more information about transaction handling and blocking, see the following Microsoft Knowledge Base article: 224453 Understanding and resolving SQL Server blocking problems
The application services a website by using connection pooling. As more connections become blocked, the website creates more connections. These connections become blocked, and the cycle continues.
The log write takes approximately 45 seconds to complete. However, by this time, hundreds of connections are backed up. The blocking problems cause several minutes of recovery time for SQL Server and the application. Combined with application problems, the stalled I/O condition has a very negative effect on the system.
Resolution
The problem is tracked to a stuck I/O request in a Host Bus Adapter (HBA) driver. The computer has multiple HBA cards with failover support. When one HBA is behind or isn't communicating with the Storage Area Network (SAN), the "retry before failover" time-out value is configured to 45 seconds. When the time-out exceeds, the I/O request is routed to the second HBA. The second HBA handles the request and quickly gets completed. To help prevent such stall conditions, the hardware manufacturer recommends a "retry before failover" setting of five seconds.
Example 2: Filter driver intervention
Many antivirus software programs and backup products use I/O filter drivers. These I/O filter drivers become part of the I/O request stack and have access to the IRP request. Microsoft Product Support Services has seen various issues from bugs that create stuck I/O conditions or stalled I/O conditions in a filter driver implementation.
One such condition is a filter driver for backup processing that allows backup of the files that are open when the backup occurs. The system administrator has included the SQL Server data file directory in the file backup selections. When the backup occurs, the backup tries to gather the correct image of the file at the time backup started. Doing this delays I/O requests. The I/O requests are allowed to complete only one at a time as the software handles them.
When the backup starts, SQL Server performance drops dramatically as the I/Os of SQL Server are forced to complete one at a time. The one at a time logic is such that the I/O operation can't be performed asynchronously, which compounds the issue. Therefore, when SQL Server expects to post an I/O request and continue, the worker is stuck in the read or write call until the I/O request is completed. The actions of the filter driver effectively disable the processing tasks such as SQL Server read-ahead. Additionally, another bug in the filter driver leaves the one at a time actions in the process, even when the backup is completed. The only way to restore SQL Server performance is to restart SQL Server so that the file handle is released and reacquired without the filter driver interaction.
Resolution
To resolve this problem, the SQL Server data files are removed from the file backup process. The software manufacturer has corrected the problem that left the file in the "one at a time" mode.
Example 3: Hidden errors
Many high-end systems have multichannel I/O paths to handle load balancing or similar activities. Microsoft Product Support has found problems with the load balancing software where an I/O request fails, but the software doesn't handle the error condition correctly. The software can attempt infinite retries. The I/O operation gets stuck, and SQL Server can't complete the specified action. Much like the log write condition described earlier, many poor system behaviors can occur after such a condition wedges the system.
Resolution
To resolve this problem, restart the SQL Server. However, sometimes you need to restart the operating system to restore processing. We also recommend that you obtain a software update from the I/O vendor.
Example 4: Remote storage, mirroring, and raid drives
Many systems use mirroring or adopt similar steps to prevent data loss. Some systems that use mirroring are software-based, and some are hardware-based. The situation that is typically discovered by Microsoft Support for these systems is increased latency.
An increase in the overall I/O time occurs when the I/O must finish before it's considered complete. For remote mirror installations, network retries can become involved. When drive failures occur, and the raid system is being rebuilt, the I/O pattern can also be interrupted.
Resolution
Strict configuration settings are required to reduce latency to mirrors or to raid rebuild operations.
Example 5: Compression
Microsoft doesn't support SQL Server data files and log files on compressed drives. NTFS compression isn't safe for SQL Server because NTFS compression breaks Write Ahead Logging (WAL) protocol. NTFS compression also requires increased processing for each I/O operation. Compression creates "one at a time" like behavior that causes severe performance issues to occur.
Resolution
To resolve this problem, uncompress the data and the log files.
For more information, see Support for databases on compressed volumes.
PAGEIOLATCH_* and writelog waits in sys.dm_os_wait_stats dynamic management views (DMV) are key indicators to investigate I/O path performance. If you see significant PAGEIOLATCH waits, it means that SQL Server is waiting on the I/O subsystem. A certain amount of PAGEIOLATCH waits is typical and expected behavior. However, if the average PAGEIOLATCH wait times are consistently greater than 10 milliseconds, you should investigate why the I/O subsystem is under pressure. For more information, see the following documents:
References
SQL Server requires that systems support "guaranteed delivery to stable media" as outlined under the SQL Server I/O Reliability Program Requirements. For more information about the input and output requirements for the SQL Server database engine, visit Database Engine Input/Output requirements.
For more information about I/O errors, see Microsoft SQL Server I/O Basics, Chapter 2.
Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register today