Data Loss Event ID 25 "The shadow copies on Volume E: were deleted because shadow copy storage could not grow in time."

Question

Saturday, February 2, 2019 12:50 AM | 5 votes

It appears that this has been an issue for over 10 years now, and there is yet a resolution to be had.

Once you fill up the available shadow copy storage allocated, then push a lot of disk I/O's, your snapshots will eventually be deleted and you will incur data loss.

Here is how:

When one first creates a VSS snapshot, it shows 50GB of available shadow storage space, which is great:

C:\Users\root>vssadmin list shadowstorage
vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001-2013 Microsoft Corp.
Shadow Copy Storage association
   For volume: (E:)\\?\Volume{8a5b0a31-7564-491e-a150-2fc8889ae217}\
   Shadow Copy Storage volume: (E:)\\?\Volume{8a5b0a31-7564-491e-a150-2fc8889ae217}\
   Used Shadow Copy Storage space: 0 bytes (0%)
   Allocated Shadow Copy Storage space: 50 GB (0%)
   Maximum Shadow Copy Storage space: 38.2 TB (100%)

But as time goes by, this will fill up and then the VSS service will only try and stay ahead by about 300MB.

So what happens if you push too many I/O's, and the VSS service can't allocate space faster than it is writing? Does it re-allocate the space requested when the disk is idle?  No.  Does it pause the disk if it needs to allocate more shadow storage?  No.

It will simply DELETE ALL of your snapshots.

Eventually, you will fill up the 50GB of shadow copy allocated storage, like so:

C:\Users\root>vssadmin list shadowstorage
vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001-2013 Microsoft Corp.
Shadow Copy Storage association
   For volume: (E:)\\?\Volume{8a5b0a31-7564-491e-a150-2fc8889ae217}\
   Shadow Copy Storage volume: (E:)\\?\Volume{8a5b0a31-7564-491e-a150-2fc8889ae217}\
   Used Shadow Copy Storage space: 50 GB (0%)
   Allocated Shadow Copy Storage space: 50.3 GB (0%)
   Maximum Shadow Copy Storage space: 38.2 TB (100%)

And from this point on, it will barely stay ahead, except by a measly 300MB, which isn't much when you're dealing with large volumes (the one above is about 40TB).

Here's a snapshot that is just waiting to disappear any day now...

C:\Users\root>vssadmin list shadowstorage
vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001-2013 Microsoft Corp.

Shadow Copy Storage association
   For volume: (E:)\\?\Volume{8a5b0a31-7564-491e-a150-2fc8889ae217}\
   Shadow Copy Storage volume: (E:)\\?\Volume{8a5b0a31-7564-491e-a150-2fc8889ae217}\
   Used Shadow Copy Storage space: 151 GB (0%) <-- any moment my snapshot could be destroyed!
   Allocated Shadow Copy Storage space: 151 GB (0%)
   Maximum Shadow Copy Storage space: 38.2 TB (100%)

 

Note that it stops showing decimals after 100GB, but I timed it as the space increases and the allocated space is still staying around 300MB ahead of the used space.

If you happen to write too many I/O's, you will get this in your event log and ALL of your snapshots will be deleted.

This is not acceptable, and I've seen it on multiple systems from Windows Server 2008 through Windows Server 2016.  I need a resolution to this issue because we have customer data at stake, and we cannot move forward unless we know, for a fact, that snapshots will not be magically deleted because the hard drive decided to be slow one day, for some arbitrary reason.

I've also read the Microsoft suggestion of putting the shadow storage on another volume. That is a band aid and NOT a good solution, as that could also incur failures. If for some reason the SAN becomes unavailable, you will lose your snapshots. If for some reason the disk degrades on the second volume, you will lose your snapshots. If for some reason the second volume runs out of space, you will lose your snapshots.

It might also be worth mentioning that the registry setting MinDiffAreaFileSize was also suggested by Microsoft, but that does nothing to alleviate the issue.

Can we please get a resolve to this issue?

All replies (9)

Monday, February 4, 2019 1:31 AM ✅Answered | 2 votes

I believe I may have found the solution.

/en-us/windows/desktop/api/vsmgmt/ne-vsmgmt-_vss_protection_level

The URL above shows that VSS uses a Volume Protection Level that DEFAULTS to deleting snapshots in order to preserve I/O speed on the original volume.

This is utterly absurd. Not only that, you cannot set this property using VSSADMIN or DISKSHADOW, at least that I can find. Microsoft, you screwed up here!

So here is how you fix it:

1) Download AlphaVSS.Common.dll (they expose all VSS COM functionality in .NET objects)
    http://alphavss.alphaleonis.com/

2) Set up the following function. I'll give it here using Powershell, because that will most likely be the environment that this is used in. The Powershell version required here is 5.1.

Add-Type -Path .\AlphaVSS.Common.dll

function SetVolumeProtectionLevel
{
    [OutputType([Alphaleonis.Win32.Vss.VssVolumeProtectionInfo])]
    param
    (
        [Parameter(Mandatory=$true)][string]$volume,
        [Parameter()][Alphaleonis.Win32.Vss.VssProtectionLevel]$protectionLevel = [Alphaleonis.Win32.Vss.VssProtectionLevel]::Snapshot
    )

    [Alphaleonis.Win32.Vss.VssSnapshotManagement]$m = $null;
    [Alphaleonis.Win32.Vss.VssDifferentialSoftwareSnapshotManagement]$d = $null;
    try
    {
        $vss = [Alphaleonis.Win32.Vss.VssUtils]::LoadImplementation();
        $m = $vss.GetSnapshotManagementInterface();
        $d = $m.GetDifferentialSoftwareSnapshotManagementInterface();
        $d.SetVolumeProtectionLevel($volume, $protectionLevel);
        return $d.GetVolumeProtectionLevel($volume);
    }
    finally
    {
        if($d -ne $null) { $d.Dispose(); }
        if($m -ne $null) { $m.Dispose(); }
    }
}

3) Now that you have a function that can set the protection level, go ahead and call it.
    SetVolumeProtectionLevel "E:\ Snapshot

This will change the volume so it will protect the snapshot instead of protecting the I/O throughput on the original volume.


Sunday, February 3, 2019 6:18 PM | 1 vote

I found a way to force expand the ShadowCopy Storage, but it isn't very elegant.

C:\> DISKSHADOW.EXE
DISKSHADOW> SET CONTEXT PERSISTENT NOWRITERS
DISKSHADOW> ADD VOLUME E:
DISKSHADOW> CREATE
DISKSHADOW> DELETE SHADOWS ID %VSS_SHADOW_1%
DISKSHADOW> EXIT

This will create a snapshot, which expands the storage by 50GB, then deletes the snapshot. The side effect is it leaves the 50GB allocated.

C:\Users\root>vssadmin list shadowstorage
vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001-2013 Microsoft Corp.

Shadow Copy Storage association
   For volume: (E:)\\?\Volume{8a5b0a31-7564-491e-a150-2fc8889ae217}\
   Shadow Copy Storage volume: (E:)\\?\Volume{8a5b0a31-7564-491e-a150-2fc8889ae217}\
   Used Shadow Copy Storage space: 179 GB (0%)
   Allocated Shadow Copy Storage space: 229 GB (0%) <-- increased by 50GB
   Maximum Shadow Copy Storage space: 38.2 TB (100%)

This is not an ideal way to push the allocated shadow storage ahead of the amount used because you have little control over the process or amount.

But until Microsoft put out a proper patch, this will help to hobble along.


Tuesday, February 5, 2019 6:25 AM

Hi,

Appreciate your update.

It will be very beneficial for other community members who have similar questions.

Best Regards,
Frank

Please remember to mark the replies as an answers if they help.
If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com


Tuesday, February 19, 2019 7:05 AM

Brain2000, thanks for your advice!

I've applied your script, and I must report, that unfortunately instead of eventid 25 by volsnap you may face eventid 87 (Volume X: is offline for shadow copy protection. The shadow copy storage has been exhausted. Please try clearing the protection fault or restart the computer followed by an increase of the shadow copy storage or a removal of unneeded shadow copies. If all else fails, revert out of shadow copy protection mode to reclaim the use of the volume while losing the shadow copies.) or even eventid 89 (same as 87, but about Read/write error), which leds to following endless eventid 57 by ntfs (The system failed to flush data to the transaction log. Corruption may occur).

Yes, shadow copies won't be removed at all, but targeting volume will become complete offline, and high likely you won't even be able to reboot targeting server properly, only hard reset will help to back affected volume online. Probably that happened because of large file processing at night, but still it's not acceptable at all.

All of that happened on 2008R2 with latest updates.

Also would be great to have a script, which will revert that Volume Protection Level changes to default.


Thursday, February 21, 2019 8:29 PM | 1 vote

Thank you for the reply.

You are correct, I have been getting eventid 87, where it takes the entire volume offline. While this is better than outright deleting my snapshot, it's still far from ideal !

So here is what I have found:

1) If you run chkdsk e: /c /i it will increase the shadowstorage to the MinDiffAreaFileSize (50GB in my case). I am trying to reverse engineer what chkdsk.exe does at the very beginning of when it runs, so I can create a function that increases the shadowstorage space and skip the disk check part.

2) If your volume does goes offline (because you aren't running chkdsk when it gets down to 1GB of shadowstorage space), you can bring it back online without rebooting. Here is the Powershell script:

#if the IO's get too high, Microsoft will either delete your snapshot or take your entire volume offline (depending on the configured protection level)
#this will clear the fault to get the volume back online
function ClearVolumeProtectFault
{
    [OutputType([Alphaleonis.Win32.Vss.VssVolumeProtectionInfo])]
    param
    (
        [Parameter(Mandatory=$true)][string]$volume
    )
    [Alphaleonis.Win32.Vss.IVssSnapshotManagement]$m = $null;
    [Alphaleonis.Win32.Vss.IVssDifferentialSoftwareSnapshotManagement]$d = $null;
    try
    {
        if($volume[$volume.Length - 1] -ne "\") { $volume += "\"; }
        $vss = [Alphaleonis.Win32.Vss.VssUtils]::LoadImplementation();
        $m = $vss.GetSnapshotManagementInterface();
        $d = $m.GetDifferentialSoftwareSnapshotManagementInterface();
        $vpi = $d.GetVolumeProtectionLevel($volume);
        if($vpi.VolumeIsOfflineForProtection)
        {
            #THIS WILL BRING THE VSS VOLUME BACK ONLINE
            #IT CAN TAKE UP TO A MINUTE OR TWO
            $d.ClearVolumeProtectFault($volume);
        }
        return $vpi;
    }
    finally
    {
        if($d -ne $null) { $d.Dispose(); }
        if($m -ne $null) { $m.Dispose(); }
    }
}

Then you call it like this:

$result = ClearVolumeProtectFault "E:";
if($result.VolumeIsOfflineForProtection) { Write-Output "VSS was offline but should be back now. Quick, run chkdsk before Microsoft jacks your volume again!"; }


Thursday, February 21, 2019 8:32 PM | 1 vote

I forgot to mention, my script will allow you to revert the volume protection level changes back to default. The 2nd parameter is where you set that. Just use "OriginalVolume" instead of "Snapshot".

i.e. SetVolumeProtectionLevel "E:" [Alphaleonis.Win32.Vss.VssProtectionLevel]::OriginalVolume


Thursday, February 21, 2019 8:40 PM | 1 vote

I found a better way to push the shadowstorage ahead by 50GB. Simply run chkdsk against the volume.

The first thing it does, before checking anything on the volume, is increase the shadowstorage size by the MinDiffAreaFileSize (50GB in my case).

I would like to isolate what it is doing, but have not been able to do so yet. That way I could skip the disk check part, and write a function to just increase the shadowstorage size.

To at least minimize the chkdsk effects, you can include the /C and /I parameters, which will cause it to run a less vigorous NTFS check.

C:\> chkdsk E: /c /i
... churn churn churn churn…

C:\> vssadmin list shadowstorage
vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001-2013 Microsoft Corp.

Shadow Copy Storage association
   For volume: (E:)\\?\Volume{8a5b0a31-7564-491e-a150-2fc8889ae217}\
   Shadow Copy Storage volume: (E:)\\?\Volume{8a5b0a31-7564-491e-a150-2fc8889ae217}\
   Used Shadow Copy Storage space: 0.99 TB (2%)
   Allocated Shadow Copy Storage space: 1.04 TB (2%) <-- increased by 50GB
   Maximum Shadow Copy Storage space: UNBOUNDED (43920964%)

Saturday, February 23, 2019 7:47 PM | 1 vote

I discovered that chkdsk creates a temporary FileShare context snapshot, and deletes it when it has completed. This is what causes the allocated shadowstorage to increase.

So I have created a powershell VSS watchdog function to create/delete a snapshot if the amount of free VSS space drops below a threshold. While it's not the most efficient thing in the world, it's better than nothing.

#this will monitor the snapshotstorage freespace every minute, and if it falls below the threshold, it will resize it by creating/deleting a snapshot
#keeping the snapshot storage with ample space prevents Microsoft from deleting your snapshot or taking your entire volume offline (depending on the configured protection level)
function VSSWatchDog
{
    param
    (
        [Parameter(Mandatory=$true)][string]$volume,
        [Parameter()][Int64]$threshold = 1024L * 1024L * 1024L #1GB default threshold
    )

    [Alphaleonis.Win32.Vss.IVssSnapshotManagement]$m = $null;
    [Alphaleonis.Win32.Vss.IVssDifferentialSoftwareSnapshotManagement]$d = $null;
    try
    {
        if($volume[$volume.Length - 1] -ne "\") { $volume += "\"; }

        $vss = [Alphaleonis.Win32.Vss.VssUtils]::LoadImplementation();
        $m = $vss.GetSnapshotManagementInterface();
        $d = $m.GetDifferentialSoftwareSnapshotManagementInterface();

        while($true)
        {
            try
            {
                $sizes = $d.QueryDiffAreasForVolume($volume);
                if($sizes.Count -gt 0)
                {
                    if($sizes[0].AllocatedDiffSpace - $sizes[0].UsedDiffSpace -le $threshold)
                    {
                        #resize shadowstorage diff area by creating and deleting a snapshot
                        [Alphaleonis.Win32.Vss.IVssBackupComponents]$c = $null;
                        $c = $vss.CreateVssBackupComponents();
                        try
                        {
                            $c.InitializeForBackup([NullString]::Value);
                            $c.SetContext([Alphaleonis.Win32.Vss.VssSnapshotContext]::FileShareBackup);
                            $null = $c.StartSnapshotSet();
                            $null = $c.AddToSnapshotSet($volume);
                            $c.DoSnapshotSet();
                            $c.AbortBackup();
                        }
                        finally
                        {
                            $c.Dispose();
                        }
                    }
                }
            }
            catch
            {
                #ignore errors
            }

            #wait 1 minute between size checks
            Start-Sleep -Milliseconds 60000
        }
    }
    catch
    {
        Write-Error $_;
    }
    finally
    {
        if($d -ne $null) { $d.Dispose(); }
        if($m -ne $null) { $m.Dispose(); }
    }
}

Thursday, January 16, 2020 11:41 AM

Dear all,

I'm having the same issue. And i wasn't able to build anything with AlphaVSS library.

Does anyone has a compiled source ?

Thank you