Server 2019 Storage Pools cannot handle a failed drive in a mirror setup

Anonymous
2024-07-03T06:11:30+00:00

Storage Spaces cannot handle a failed drive in a mirror.

We have a data centre hosted Windows Server 2019 machine with 6 drives. 4 x 1TB SSD and 2 x 2TB HDD.

The SSDs are configured in a different system and are not part of the issue.

The two 2TB HDD drives were configured in a Storage Pool.
A 2TB Fixed provisioned Mirror Virtual Disk was set on said pool.

Partition set up as a ReFS with 4 x VHDx for virtual machines configured on the drive.

Been running beautifully for 3+ years without issue. Until a drive failed...

This would normally not be an issue. Previous experience says simply remove the failed drive, install a new drive, add it to the set/pool, configure it to take over from the failed drive and the system rebuilds. As long as you make sure they are the same size as the failed disk, there shouldn't be any issues so we ensured they were the same drive and everything looked good.

Failed drive was retired from the pool automatically so I added the new drive to the pool and then said repair virtual disk... Which started and then stopped almost immediately.

On digging, "Not enough space" was the error on the repair. What??? How??
The original drives are 2 000 398 934 016 bytes... New drive is exactly the same. So what is going on??

Then I noticed something... The original drives have 257MB free space. I assume it is just some remnants from the mirror process and some sort of alignment... Never been an issue before. I've seen this multiple times with multiple systems.
The new drive (with nothing on it) that has just been attached to the pool, suddenly has 783MB already used... Which means (according to the system) there isn't enough space to apply the mirror of the virtual disk. There is 500MB-odd short. Therefore repair can't continue.

Shrink the virtual disk? "The Windows Storage Provider does not support shrinking virtual disks."

Ok, maybe something went a bit weird, just remove, nuke and reattach the new drive... Can't remove a functioning drive from a pool with a virtual disk in a degraded state.

So now I have a failed mirror, I can't rebuild the mirror, can't remove the new drive, can't use the new drive and now I'm paying for a drive I can do nothing with.

Any suggestions? I could probably get it to work if I can figure out why the new drive suddenly has some of the space used on it for no good reason and clear the usage.

And before anyone starts telling me "Just restore from backup" or some other random variant of that... What the heck is the point of having a mirror if every time I have a failure, I have to restore from backup? If that is the case, I may as well use a stripe, get the performance benefit and restore from backup when a drive fails.

tl:dr

  1. Failed drive results in degraded mirror.
  2. Not enough space on new identical drive to rebuild mirror
  3. Can't shrink the virtual disk because "The Windows Storage Provider does not support shrinking virtual disks."
  4. Can't remove new drive because of degraded mirror in pool
  5. See #1
Windows for business Windows Server Storage high availability Other

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question. To protect privacy, user profiles for migrated questions are anonymized.

0 comments No comments
{count} vote

3 answers

Sort by: Most helpful
  1. Anonymous
    2024-07-03T08:17:08+00:00

    Hi Demon,

    Hope you're doing well.

    1. First, ensure that you've correctly identified the failed drive. If one of the drives is marked as faulty, it's essential to know which one. You mentioned that the original drives have 257MB of free space. This might be related to the issue.
    2. If both drives are healthy, try running the 'Repair-VirtualDisk' cmdlet in PowerShell to repair the virtual disk. This command rebuilds the data on the failed or degraded physical disks:

    Repair-VirtualDisk -FriendlyName "MyVirtualDisk"

    1. If you suspect the pool is corrupt, consider running other 'Get-VirtualDisk' cmdlets to verify the virtual disk across the pool. If the pool is indeed corrupt, you might need to rebuild it and then restore data from backups.
    2. Verify that the new drive has sufficient space to accommodate the mirror. The unexpected 783MB usage on the new drive might be causing the "Not enough space" error during repair. Unfortunately, Storage Spaces doesn't support shrinking virtual disks directly.
    3. If all else fails, consider data recovery options from the existing drives. While it's frustrating not to have a seamless mirror rebuild, recovering data from the healthy drive might be necessary.

    I hope these steps help you resolve the issue.

    Best Regards

    0 comments No comments
  2. Anonymous
    2024-07-03T08:50:26+00:00
    1. I have identified the correct device, removed and replaced the device.
    2. Repair-VirtualDisk comes back with "Not enough available capacity", which is the heart of the problem.
    3. Pool is functional. I can continue using the server I just can't get the mirror operational again. So the next failure will bring everything to a standstill while I recover from backup. Frustrating because I shouldn't need to.
    4. The drive was new and unformatted. The only explanation for the 783MB used space was attaching it to the pool has possibly stuck some configuration meta data or something on the drive? I don't know but I can't access the drive to wipe it clean and confirm. And if it is config/meta data, how did the previous drive accomodate this with the virtual disk?
    5. As mentioned in the main post... If I cannot replace a drive for a mirror seemlessly, what is the point of the mirror...

    There is no reasonable explanation or solution to solve this issue yet. I've hunted online and most posts are a variation of "Restore from backup", which half the time negates the whole point of having a mirror in the first place. Yes, the client can continue, but I cannot fully recover from a failure so the next one will kill the system. It has bought me some time, but it is time I shouldn't have needed in the first place if it all just worked.

    Knowing what I known now, if I ever do this again, I would create the virtual disk and leave 5GB on the drives as open capacity. Then if the drive is replaced, instead of having to try match sector for sector (as a traditional RAID1 would), there is a "buffer" to deal with configuration/meta data that the pool appears to stick on the drive before making it usable.

    0 comments No comments
  3. Anonymous
    2024-11-28T16:44:47+00:00

    We have run into the same issue, is there any resolution to this yet?

    0 comments No comments