Operating System Error (665 – File System Limitation) Not just for DBCC Anymore
The operating system error 665, indicating a file system limitation has been reached continues to gain momentum beyond DBCC snapshot files. Over the last ~18 months I have reproduced issues with standard database files, BCP output, backups and others.
We have posted previous blogs talking about the NTFS attribute design and associated limitations (665) as the storage space for the attribute structures becomes exhausted.
Online DBCC has been susceptible to this limitation leveraging a secondary stream for copy-on-write activities. The sparse nature of DBCC snapshot or a snapshot database can drive attribute exhaustion.
As space is acquired the disk storage location(s) and size(s) are stored in the attribute structures. If the space is adjacent to a cluster already tracked by the file the attributes are compressed into a single entry, spanning the entire size. However, if the space is fragmented it has to be tracked with multiple attributes.
The 665 issue can pop up with larger file sizes. As the file grows it acquires more space. During the space acquisition the attributes are used to track this space.
Repro 1 – DBCC Snapshot or Snapshot Database
The copy-on-write logic acquires new space as pages are dirtied in the main database. As writes occur to the snapshot target more attributes are used. The space allocations and clusters are by definition and design stored in spare locations.
Repro 2 – Database files
I can insert data into a database with a smaller auto grow size, each acquiring new disk space. If the disk space is not acquired in strict, cluster adjacent order the growth is tracked with separate attributes. After millions of grows from a large import or index rebuild (usually around 300GB) I can exhaust the attribute list and trigger the 665 error.
Repro 3 – BCP
BCP extends the output file in the process of simply writing to the file. Each block written has to acquire disk space and it is reliant on adjacent disk space allocation to accommodate a large file.
One customer was trying to use a copy of BCP per CPU and the query option to partition the output streams and improve the export performance. This can work very well but in this case it back-fired. The output location was shared and as each copy of BCP was writing data it caused them to leap frog each other on the same physical media. Each of the BCP output streams quickly exhausted the attribute storage as none of them were acquiring adjacent storage.
What Should I Do?
Testing has shown that defragmentation for the physical media may help reduce the attribute usage. Just be sure your defragmentation utility is transactional. (Note: defragmentation is a different story on SSD media and typically does not address the problem. Copying the file and allowing the SSD firmware to repack the physical storage is often a better solution.)
Copy - Doing a file copy may allow better space acquisition.
Use SQL Server 2014 for DBCC online operations. SQL Server 2014 no longer uses the secondary stream but a separate, sparse file marked for delete on close. This may reduce the shared attribute storage required by a secondary stream.
Adjust the auto growth to acquire sizes conducive for production performance as well as packing of attributes.
Format the drive using the larger NTFS metadata, file tracking structures providing a larger set of attribute structures. https://support.microsoft.com/en-us/kb/967351
Use ReFS instead of NTFS. ReFS does not contain the same design as NTFS.
Adjust utility write sizes, for example the BCP buffer size change: I just discovered that the fix list for SQL Server 2012 SP2 does not appear to contain the fix information. The fix changes the bcpWrite (internal API used by ODBC driver and BCP.exe) from a 4K write size to 64K write size.
Backup – Carefully plan the number of files (stripe set) as well as transfer and block sizes.
Bob Dorr - Principal SQL Server Escalation Engineer
Comments
Anonymous
June 10, 2015
The comment has been removedAnonymous
June 11, 2015
Backup – Carefully plan the number of files (stripe set) as well as transfer and block sizes. We encountered the 665 error on older SQL server version during larger database backup. Is there any document or KB that point out the proper way of designing transfer and block size that address this issue?Anonymous
June 17, 2015
For existing system is there a way we can check how much of FileRecord has been used on a file?Anonymous
November 10, 2015
Saving the DB in two parts solved the issue.Anonymous
June 04, 2016
It seems that 4k allocation unit/cluster size NTFS filesystems are more susceptible to errors 665 than 64k cluster size filesystems, assuming the same file growth patterns. Is that accurate?