What component of I/O subsystem caused database corruption in my SQL Server

IT Researcher 46 Reputation points
2024-07-11T05:49:49.4266667+00:00

As per Error 824 that has occurred in my SQL error Log, I have found that the I/O subsystem is the cause of the database corruption.

Error 824

However, I want to dig further and understand which specific component of the I/O subsystem caused the corruption.

Is there a methodology to identify this? How can I find out which component of  I/O has caused the corruption to my database?

Additionally, I have found information on how to check which component of the I/O subsystem has caused latency in the link below.

I/O Latency

However, this does not address whether I/O subsystem latency causes database corruption.

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
14,030 questions
{count} votes

2 answers

Sort by: Most helpful
  1. LucyChenMSFT-4874 5,060 Reputation points
    2024-07-11T08:16:38.4466667+00:00

    Hi @IT Researcher ,

    Thank you for your reaching out and welcome to Microsoft Q&A!

    Is there a methodology to identify this? How can I find out which component of I/O has caused the corruption to my database?

    Do you mean you want to find out which hardware or driver issue in the I/O path caused the error 824?

    We recommend you use the SQLIOSim to check hardware diagnostic. It simulates a very heavy SQL Server workload and should do a far better of discovering flaws in your hardware setup than individual hardware vendors' diagnostics will. You can use it to expose hardware as the problem in difficult-to-diagnose corruption problems.

    In addition, please refer to this article, we can get the information below and it recommends use Stellar Repair for MS SQL to fix issue. The SQL Recovery software repairs corrupt SQL database files and recovers tables, indexes, keys, rules, constraints, triggers, and defaults. The understandable user interface of this software makes it possible to recover various objects present in MDF and NDF database files without much effort.:

    User's image

    By the way, it seems that the second link you provided is expired. For your issue 'this does not address whether I/O subsystem latency causes database corruption'. I checked the official document of I/O fundamentals, it doesn't mention the information of I/O subsystem latency will cause database corruption and error message 824.

    Best regards,

    Lucy Chen


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    https://docs.microsoft.com/en-us/answers/support/email-notifications


  2. Erland Sommarskog 112.8K Reputation points
    2024-07-11T09:36:31.4133333+00:00

    That is not a simple task. And only including the SQL Server tag is maybe not the best idea, since this sort of troubleshooting requires a different skill set than even a very seasoned SQL Server DBA might have.

    If the corruption occurs often enough so that is repeatable, then you can try replacing one component at a time. But this is a quite tedious operation, since the I/O subsystem consists of many parts, particularly if the database files are on a SAN.

    I recall that I played this game on a desktop at home many years ago, where files were randomly changed when copied. I removed one component in the computer at a time, and I could eventually track it down to a specific memory stick. I would expect this to be a lot more complex in the case of a server with files on a SAN. And if the corruption only occurs like once a week, it takes forever.

    Trying SQLIOSim to stress the hardware as Lucy suggested is not a bad idea. Because if you can't provoke the error to happen at will, it is very difficult to ensure that you have gotten rid of the evil. With less than you replace all your hardware.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.