Storage Spaces cannot handle a failed drive in a mirror.
We have a data centre hosted Windows Server 2019 machine with 6 drives. 4 x 1TB SSD and 2 x 2TB HDD.
The SSDs are configured in a different system and are not part of the issue.
The two 2TB HDD drives were configured in a Storage Pool.
A 2TB Fixed provisioned Mirror Virtual Disk was set on said pool.
Partition set up as a ReFS with 4 x VHDx for virtual machines configured on the drive.
Been running beautifully for 3+ years without issue. Until a drive failed...
This would normally not be an issue. Previous experience says simply remove the failed drive, install a new drive, add it to the set/pool, configure it to take over from the failed drive and the system rebuilds. As long as you make sure they are the same size as the failed disk, there shouldn't be any issues so we ensured they were the same drive and everything looked good.
Failed drive was retired from the pool automatically so I added the new drive to the pool and then said repair virtual disk... Which started and then stopped almost immediately.
On digging, "Not enough space" was the error on the repair. What??? How??
The original drives are 2 000 398 934 016 bytes... New drive is exactly the same. So what is going on??
Then I noticed something... The original drives have 257MB free space. I assume it is just some remnants from the mirror process and some sort of alignment... Never been an issue before. I've seen this multiple times with multiple systems.
The new drive (with nothing on it) that has just been attached to the pool, suddenly has 783MB already used... Which means (according to the system) there isn't enough space to apply the mirror of the virtual disk. There is 500MB-odd short. Therefore repair can't continue.
Shrink the virtual disk? "The Windows Storage Provider does not support shrinking virtual disks."
Ok, maybe something went a bit weird, just remove, nuke and reattach the new drive... Can't remove a functioning drive from a pool with a virtual disk in a degraded state.
So now I have a failed mirror, I can't rebuild the mirror, can't remove the new drive, can't use the new drive and now I'm paying for a drive I can do nothing with.
Any suggestions? I could probably get it to work if I can figure out why the new drive suddenly has some of the space used on it for no good reason and clear the usage.
And before anyone starts telling me "Just restore from backup" or some other random variant of that... What the heck is the point of having a mirror if every time I have a failure, I have to restore from backup? If that is the case, I may as well use a stripe, get the performance benefit and restore from backup when a drive fails.
tl:dr
- Failed drive results in degraded mirror.
- Not enough space on new identical drive to rebuild mirror
- Can't shrink the virtual disk because "The Windows Storage Provider does not support shrinking virtual disks."
- Can't remove new drive because of degraded mirror in pool
- See #1