Trouble with Raid5, stornvme
Good day everyone, this is my first question in the forum, but, I'm literally lost on what I should do with this. I have a Software Raid5 (created using Microsoft Disk Manager) in Windows Server 2019 (and previously in 2016), created using 4 Sabrent Rocket 4 Plus which gives me trouble constantly by disconnecting one or more units from my raid (by sending a reset signal to it).
And, here's where it gets weird, the server never fails when there is a strong usage of the drives (I'm using it for a SQL Server Database to boos performance), but, when my core operation is done (after 9PM), I get errors in Event Viewer telling me that the device was restarted, that in turns (some times) makes my raid fail, sometimes still accessible and sometimes not even that (by taking one or more devices).
The process on which this happens is as follows:
1.- I get the error telling me that some member of my Raid was restarted (A reset event was emitted)
2.- Then I get a warning from my disk (an error detected in the device [...] while doing a pagination operation)
3.- Then my raid fails (the driver detected an error in the controller on [...])
4.- Then my SQL Server fails along with anything running on my raid
So, what have we done to try and mitigate this issue?
- Change the version and edition of windows (from 2016 standard to 2019 DataCenter (license pending, we were planning to upgrade it if it fixed the issue)
- Change the disk[s] (we had other 4 for a replication server)
- Change the PCIe adapter cards
- Upgrade our M2 firmware
- Disable Windows PCI-E energy management by changing the power-plan of Windows
- Update Windows hopping a new driver came-out fixing this issue
- Update of server Firmware and drivers
Yet, nothing has solved our issue, this a critical failure to us (and a weird one), since the disk failing is random, sometimes, the raid continues to operate for 1 month without issue (apart from "A reset event was emitted"), and sometimes our raid fails 2-3 days in a row.
So, my questions are:
* Does anyone else have the same error?
- Do I have to use another version of windows?
- Can I change the driver (for hopefully one that fixes the issue) ?
- Is there a way to prevent my devices from being reset?
- Should I give-up on a software raid and just create a hardware level raid 5 (DELL R540)?
Thank you all for your time
Sign in to comment