Deduplication fails on General Use file server suddenly with "0x80565323, 0x80565309, A required filter driver is either not installed, not loaded, or not ready for service" - Windows Server 2004

Felix Wagner 11 Reputation points
2021-02-23T05:44:30.987+00:00

Hello,

I am writing here to find answer and solve the issue which did cause our new moved File Server to have lost the data of a whole day.

The setup:
* Two Node Storage Spaces Direct Cluster with Full SSD Storage, is the base for the VMs of a General Use File Server Cluster

The issue:

  • Firstly, Dedup seems to run fine. Data was deduplicated and reported.
  • Then suddenly files where inaccessible. Trying to Start-DedupJob -Type Unoptimization did not helped and was terminated. Or this task reported that 56k files where inaccessible.
  • The evenlog is full with different errors.
  • Event ID:
  • 6144 - File inaccessible
  • 4137 - Volume not enabled for data deduplication - Well Get-DedupVolume and Get-DedupStatus report it differently at that time
  • Again an 4137 with:

0x80565309, A required filter driver is either not installed, not loaded, or not ready for service

This one made me courious. For me ti means in the beginning everything is fine. Dedup is running and the filter driver applied to the NTFS volume. But then something happens and this filter is removed or something happens to the System Volume Information and Dedup cannot read anything anymore. There was nothing touched in that area. All folders on the file share are accessible for SYSTEM.

I tried to stop dedup via unoptimization. It did work if:
* I have a volume without any dedup files
* Remove it from the file server role
* Enabled it as CSV
* Run unoptimization task

All other volumes are not recoverable. Every Dedup tasks simply fails with an error.

We have other files servers on different hardware platforms where we use DFS-R. On these servers Dedup is enabled too and is working fine.
The difference is that those servers (two of them are served via an SOFS role) use VHDX files for the data. Only this Cluster with the main data on top of the S2D Cluster used the VHD Set disk types.
I searched now already for a lot for information and did not found any. So therefor, I start asking here if this is a known issue between dedup a volume which is a VHD set or if I do oversea any other issue?

For now the cluster is offline, so I could together with Microsoft take a deeper look on this issue. For me this looks critical as even if this is an known not-supported configuration, the MS documentation does not mention it. Neither in VHD sets nor in Data Dedup, or I have a blind spot on that sentence.

Hopefully, I just faced a rare bug and can help to avoid others having the same issue.

Kind regards

Felix

Windows for business | Windows Server | Storage high availability | Clustering and high availability
Windows for business | Windows Server | User experience | Other
{count} votes

6 answers

Sort by: Most helpful
  1. Xiaowei He 9,936 Reputation points
    2021-02-26T06:08:04.357+00:00

    Hi,

    After a few research, I found a similar issue which reports below:

    Symptoms
    Bugcheck stop 9E or 133 occurs in Hyper-V server with Deduplication enabled. Also, issues of VHD corruption have been reported by customers
    using Datacore SAN symphony storage pool, when VHDs are stored on volumes with Dedup enabled

    Cause
    Dedup filter holds the MCB spinlock for too long during periodic volume flush under high churn triggering this. The issue is more prominent if the VHDs of the VM are not on CSV volume and hence not opened in write-through mode

    Resolution
    Increasing the frequency of dedup flush by creating the following registry entry

    HKLM\System\CurrentControlSet\Services\ddpsvc\Settings\
    FlushMaxDelay DWORD 60

    NOTE: By default, this entry may not be present. In which case the default value is 300 sec

    Please test if the resolution to increase the frequency of dedup flush could work in your case.

    Thanks for your time!
    Best Regards,
    Anne

    -----------------------------

    If the Answer is helpful, please click "Accept Answer" and upvote it.

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    2 people found this answer helpful.

  2. Felix Wagner 11 Reputation points
    2021-02-27T23:25:04.78+00:00

    Hello Anne,

    as your suggestion didn't help :( . I started certain further evaluation of this issue.
    After this investigation it seems that Windows Server 2004 and 20H2 have issues with Deduplication and non-CSV volume inside a cluster.
    Windows Server 2019 is fine. I created a new (Guest-)Cluster with a Windows Server 2019 and everything is working.

    Meanwhile, I did come over the fltmc.exe . This one helped to see what is going on with the Dedup filter.

    So, as I create a new volume on a single node an fltmc.exe instances provides the the correct output:

    Dedup 180450 Dedup 0 00000003
    Dedup F: 180450 Dedup 0 00000003

    After I do switch the volume (the file server role) to the other server and back (same applies already to the other node)

    Dedup 180450 Dedup 0 00000003

    This result can be reporduced every time.
    So it seems, for some reason that Dedup filter cannot be applied as the volume moves.

    This behavior isn't seen if the volume is a CSV. CSVs can be moved without any issue.

    So, I created a new cluster based on Windows Server 2019 and Dedup works there. After created, moving fltmc.exe instances shows this result

    Dedup 180450 Dedup 0 00000003
    Dedup F: 180450 Dedup 0 00000003

    For me this means that the initial issue lies at this error:

    0x80565309, A required filter driver is either not installed, not loaded, or not ready for service

    This error shows in more detail after a try with Enabled-DedupVolume -Volume F: -DataAccess:

    Failure reason: FSCTL_DEDUP_FILE.DEDUP_SET_CHUNK_STORE_CACHING_FOR_NON_CACHED_IO failed with ERROR_INVALID_FUNCTION for volume

    So something is wrong in connection of Windows Server 2004 & 20H2 with the creation of the Dedup Chunk storage.

    At this point, I am sorry: I do not know how to copy data out of the 'F:\System Volume Information\Dedup\' to provide further assistance. Simply for me, those files are inaccessible. Neither Copy-Item nor Get-Content provide access from a remote PowerShell (CredSSP is disabled in our environment due to missing knowledge and known security issues). Well, I have solved my issue for now. Even more, with this error in background, I should deploy the file share for my company on Windows Server 2019. I wouldn't have thought before that for a file share this would be an issue, but well de-duplication isn't something easy. ;) :) (Our AD CS servers working quite well on Windows Server with upgrades.)

    Moreover, I would like to try out how a Windows Server 2019 behaves inside a Cluster with those 20H2 servers, but there seems to be an incompatibility already. So no chance.

    So fr me the issue is closed for now. But, well, I feel that the Microsoft QA should take a deeper look here before the next LTS release of Windows Server comes out with this bug. ;) ;) :)
    The good thing for me is, I learned a lot of about volumes, drivers, filter drivers and how all those great things with CSVs work together and why a FileShare on a SOFS is a very bad idea. For sure it is. :D

    Kind regards

    Felix

    2 people found this answer helpful.

  3. Felix Wagner 11 Reputation points
    2021-02-23T19:34:11.607+00:00

    As a follow up to give a bit more information:

    Meanwhile, I saw this error, but cannot really deal with it:

    Data Deduplication error: Unexpected error.

    Operation:
    Processing deleted chunk store streams.
    Indexing active chunk references.
    Starting chunk store garbage collection.
    Running the deduplication garbage collection job.

    Context:
    File name: \?\Volume{fad55a56-d363-496f-8f37-ef1707b40a67}\System Volume Information\Dedup\ChunkStore{1B8EC874-DD08-42FE-A0FF-FAC35A2FEC9A}.ddp\Stream\00290000.00000002.ccc
    Chunk store: \?\Volume{fad55a56-d363-496f-8f37-ef1707b40a67}\System Volume Information\Dedup\ChunkStore{1B8EC874-DD08-42FE-A0FF-FAC35A2FEC9A}.ddp\Stream
    Volume name: C:\ClusterStorage\Volume1 (\?\Volume{fad55a56-d363-496f-8f37-ef1707b40a67})

    Error-specific details:
    Error: CDedupFilter::DeviceIoControl(\?\Volume{fad55a56-d363-496f-8f37-ef1707b40a67}\System Volume Information\Dedup\ChunkStore{1B8EC874-DD08-42FE-A0FF-FAC35A2FEC9A}.ddp\Stream\00290000.00000002.ccc, FSCTL_DEDUP_FILE, ...), 0x80070001, Incorrect function.

    The other error which I continuously receive is this one:

    Failure reason: FSCTL_DEDUP_FILE.DEDUP_SET_CHUNK_STORE_CACHING_FOR_NON_CACHED_IO failed with ERROR_INVALID_FUNCTION for volume

    I assume that this one is caused by the issues above.

    I like to make sure at this moment:
    We are talking about a Storage which is a VHD SET. The VM with the VHD Set works directly on a S2D HCI Cluster Windows Server 2019 which has tested 750k IOPS and 1500MB/s read Throughput. Meanwhile extended another 10 SSDs. (Hence over 1Mio?). The S2D is configured as Nested-Mirror. The S2D itself looks very fine. Latency is in micro seconds of 99,99% a day. The rest are peaks to maximum of 3ms for a short peak moment. Such a peak moment did not happen at the time dedup did fail. So I assume the storage sub-system is fine.
    The issue lies somewhere on the Hyper-V VM, the controller driver (there are some errors inside the VM event log too.) and the dedup driver.

    Still the question why the chunk gets corrupted or seems to be corrupted for the Dedup filter driver.
    Anyone who can assist here?

    0 comments No comments

  4. Felix Wagner 11 Reputation points
    2021-02-26T16:50:20.42+00:00

    Hello Anne,

    thank you for your advise. I am not sure it applies to my situation. Still I start to understand what this parameter does and why it could be helpful. So, I will implement it and try it out. Thank you for helping here.

    Still, I am not sure if this applies to the situation as:

    • Deduplication is not running on the S2D
    • Deduplication errored inside a VM on top of that S2D. so there is no Hyper-V on this "VM"
    • That "VM" is a Cluster of two VMs.

    Meanwhile, I rebuild the storage from scratch. I updated both VMs to Windows Server 20H2, applied all updates (beside of previews).
    As I did now configure the storage I recollect that:

    • The first time I did do it, I simply mounted the VHD Sets to both VMs
    • Over Server Manager I initialized them
    • Directly I did create the volumes. In this step I believe Deduplication was not available in Server Manager.
    • After creating the volumes I did move into Failover Cluster and added the Disks as Available Storage and created the Role
    • After the disk was attached to the File Server General Use Role, I enabled Deduplication via Server Manager (now it was possible)

    A deeper look into event viewer showed, that the errors mentioned started right in this process. I believe something did went wrong there. I am sure that neither of those disks was writable on both nodes at the same time (the 2nd node always showed Readonly flag inside Server Manager).

    Now, I did recreate the VHD sets. This time I adjusted the process:

    • Created disk and mounted them to both VMs
    • Initialized the disk on a single VM via PowerShell Cmdlet Initialize Disk -Number
    • Now I did run Cluster verification
    • added the disks as available storage to the Cluster
    • Added the disk to the File Server Role
    • Now, in Server Manager as I created the Volumes, Deduplication could be enabled directly without any issues

    After a half day now, it looks like everything is running fine. Event log shows not a single error or warning. Only normal Information telling everything is fine. Well, Shadow copies are not yet enabled on these volumes as the DFS-R from the old file share server is still running. (May approach to carefully test now the new deployment)

    So could it be that the wrong process damaged the volumes? I am not sure, how I did setup the other servers, but as this cluster uses only shared storage it maybe the case, or?

    So hence it seems, my knowledge about volume management and how clusters interact lacks here some important part.

    Kind regards

    Felix

    0 comments No comments

  5. Felix Wagner 11 Reputation points
    2021-02-27T01:22:53.123+00:00

    Hello Anne,

    update on this case:
    Deduplication again broken. To apply your suggested settings, I restartet one node and moved the role manually. Right after moving the role, the error with the filter driver appears again:

    Data Deduplication failed to start job type "Optimization" on volume "\?\Volume{ba127eb2-8560-4a59-83d9-54d7e55c752f}\" with error "0x80565309, A required filter driver is either not installed, not loaded, or not ready for service.

    ".

    So it seems that right at moving the issue appears. Deduplication is for sure installed on both nodes.

    Kind regards

    Felix

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.