Good questions. We are continually analyzing how to treat disks that aren't failed, but aren't working as expected. These disks as capacity can have an effect on the system, but as cache disks have a greater effect. We did create the outlier detection to help with making these "marginal disks" easier to detect.
The next thing is what can the system do about them? At first blush the answer is, "Get rid of them!!". However this is fraught with peril. If there are more than one disks having a problem and you get rid of them, you can lose data because they may have the only copies of specific pieces of data....that's not good.
What if you get rid of one, and right after that another has a problem....do you get rid of that one, same data issue as just described, not necessarily a good thing. Sometimes a slow disk with good data is better than just cutting disks out of the system. We are working on finding smart ways to take all of this into consideration with our storage health automation, but we are being cautious to ensure we are not putting any data at more risk.
With regard to retiring, part of that process is to take the data on that disk and move it somewhere else (if there is spare capacity to do so). If the disk is slow, that takes time.
Removing from the pool is another option, but the success is somewhat dependent on the state of the disk itself. Since the disk is not acting as expected, anything you do with regards to software with it is somewhat dependent.
Last resort is physically removing it, which you did.
One last thing, which my not be directly relevant to your situation, but is to the marginal disk discussion, we have seen disks be marginally responsive. Customer removes them and sends them back to the seller which tests them and it seems fine. There have been devices out there that only become marginal in certain I/O patters or at certain levels of I/O stress (which S2D can throw I/O at a device faster that most anything else out there). I'm not aware of any device that keeps perf or errors based on perf, on the disk, so the disk manufacturers have no evidence that the disk was behaving poorly......
Sorry for your frustration and thank you for the description and feedback. We continue to strive to improve.
I hope this helps,
Steven Ekren
Senior Program Manager
Windows Server and Azure Stack HCI
Microsoft