Windows Server 2019 dedup compatibility issues after 2021-06 Cumulative Update (KB5003646)

Chris 21 Reputation points
2021-06-24T04:31:08.097+00:00

I have discovered a compatibility issue with the Windows Deduplication feature on Windows Server 2019 after the 2021-06 Cumulative Update (KB5003646) is applied.

If the Windows Server 2019 machine has a volume attached which was originally deduped on a Windows Server 2016 machine then it will BSOD relating to dedup.sys.

The dedup.sys driver for Windows Server 2019 was updated between 2021-05 CU (file version 10.0.17763.1554) and 2021-06 CU (file version 10.0.17763.1971).

I have reproduced this issue with a clean install of Windows Server 2019 from the RTM ISO with no additional software or configuration changes. Up to and including OS build 17763.1935 (2021-05 CU) there are no issues accessing data on a volume originally deduped on Windows Server 2016. Once the system is updated to OS builds 17763.1999 (2021-06 CU) then accessing data on that same volume will cause a BSOD relating to dedup.sys.

The issue is not present for volumes which were first deduped on Windows Server 2019 only when the volume was first deduped on Windows Server 2016.

This was originally discovered when performing a rolling OS upgrade of a failover file server cluster running Windows Server 2016 nodes which use deduped volumes for file shares. Once the roles with the deduped volumes were moved to the first Windows Server 2019 node it caused that node to BSOD.

Windows Server
Windows Server
A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.
12,271 questions
0 comments No comments
{count} votes

8 answers

Sort by: Most helpful
  1. Chris 21 Reputation points
    2021-07-05T23:24:27.107+00:00

    I have had a further update from Microsoft support that they have acknowledged the issue and the development team is looking at it but “it may take a long time to get a released fix”.

    I would say that this will go in the large basket of bugs that never get fixed so the best way to deal with it if at all possible would be to re-attach the volume to a Server 2016 machine, disable dedup and unoptimise the volume then re-attach the volume to Server 2019 enable dedup and optimise the volume.

    Obviously that’s not going to be possible if you have a large amount of data and/or it is a physical server, but I really don’t see Microsoft fixing this given that it is still not even a documented acknowledged issue.

    1 person found this answer helpful.
    0 comments No comments

  2. Teemo Tang 11,356 Reputation points
    2021-06-24T06:34:48.627+00:00

    Your discovery and reproduce are valuable, thanks for your effort, I will submit this situation to Microsoft.
    However, In-place upgrades are never recommended. In fact, since Windows 10/Server 2016 released, upgrade processes are essentially a clean-install, and then migrate data. It's actually quite a sweet technique they're moving towards where it's almost like partition-based installs but with the same registry/data/programs folders.
    Therefore, if you clean install Windows Server 2019 then update it to the build 10.0.17763.1971, I think Windows Deduplication feature will not causes BSOD issue, look at this similar case:
    https://learn.microsoft.com/en-us/answers/questions/435392/post-server-2016-gt-2019-upgrade-dedupsys-bsod.html?childToView=435473#answer-435473

    -------------------------------------------------------------------------------------

    If the Answer is helpful, please click "Accept Answer" and upvote it.
    Information posted in the given link is hosted by a third party. Microsoft does not guarantee the accuracy and effectiveness of information.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


  3. Chris 21 Reputation points
    2021-06-25T05:01:04.473+00:00

    Hi TeemoTang-MSFT,

    Thank you for your assistance.

    I would also add the following:

    • the BSOD doesn't occur as soon as you access the volume, rather if you try to read a large chunk of data from the volume. The most reliable way I found to reproduce the crash without fail is to start an antivirus scan targeting the deduped volume. My tests have been done with Windows Defender, not third party AV. But I have also confirmed the same behavior occurs if Windows Defender is removed and a 3rd party AV is installed.
    • I have tested with two different volumes which were created and deduped on Windows Server 2016 originally and then moved to a Windows Server 2019 CU 2021-06 system. So it is not a specific volume that has an issue.

  4. Stefan 6 Reputation points
    2021-07-02T10:38:45.717+00:00

    Hi,

    can we have an update to this problem, please?
    Also this should be commuicated immediately on a higher level, so other customers have a chance to react.

    This just turned out to be huge issue for our production environment. Literally lost money due to an outtage of more than hour.

    All the details described above apply to us. Volumes originally deduped on server 2016, now 2019 server.
    We had one failing node 4 days ago, second node failing 3 days ago. So far no big deal, fail-over handled it. Deactivated AV to reduce amount of read data.

    Today both nodes failed simultaneously. A restart did not help, BSOD as soon as 30 secs after reboot.
    We aimed for uninstalling the cumulative update. Which is impossible with a BSOD immediately after startup. Booting in save mode was no solution since uninstall did not work. Guess some required services missing, but who knows which ones to start...
    After some failed reset attempts I was lucky enough to prevent the Fileserver role from starting up. At this point the server did not crash and I was able to unistall the cumulative update. But even then I had to wait at 15 - 30 mins at a "100% applying updates" boot screen, not sure if the machine was stuck or doing anything...
    Back in Windows without the latest update all seems working fine atm.

    0 comments No comments

  5. Chris 21 Reputation points
    2021-07-02T11:45:26.503+00:00

    I have a support case open with Microsoft for this bug but I have also had no updates since providing them with the crash dump and event logs 5 days ago, despite being told it would take 1-2 days to analyse.

    In my situation I have been fortunate enough to be able to go back to Server 2016 and am considering disabling dedup and unoptimising the volumes on Server 2016 then creating new Sever 2019 nodes and deduping the volumes again (I’ve tested this and it does appear to work around the issue). My dataset is not so big that this is out of the question, and even if Microsoft do eventually fix their bug - who’s to say the same thing won’t happen with an update again down the track as there is clearly a difference on dedup volumes created in Server 2016 vs Server 2019.

    0 comments No comments