MABS V3 UR2 - ReFS Slowdown and then Corruption Nightmare

microchipmatt 6 Reputation points
2023-03-26T14:59:42.3066667+00:00

I am attempting to use MABS V3 UR2. I installed MABS V3 UR2 3 weeks ago, and I have been running into the same ReFS nightmare that is well doecumented in this thread:

https://social.technet.microsoft.com/Forums/en-US/7e4e4da4-1168-46cd-900f-9ca2bc364d5a/dpm-2016-mbs-performance-downward-spiral?forum=dataprotectionmanager

I am running it on a VM with LOTS of resources. The VM is running windows server 2019, and MABS is the newest version as per Microsoft best practice. I have a Datastore, that is Heathy from a Block and Drive level, that is connected via iSCSI. When I first set up MABS, I Connected the Datastore, MABS formatted it as ReFS, and all was well. I setup my protection groups and took my first set of backups. They were ALL successful. I made sure to add my protection groups one by one, to make sure not to overload the ReFS Replica creation process (known issue with ReFS and newer versions of MABS, apparently NOT an issue in Windows Server 2012 and Below running old versions of DPM. Apparently when DPM which became MABS used NTFS NONE of this was an issue. When MABS started requiring ReFS and Excluded NTFS this is when ALL the issues started for future versions of MABS.

But back to the issue. After a week, I started experiencing the classic MABS/DPM/ReFS slowdown. Then I would reboot, it would seem a little better but as the replicas took more backups, the system slowed right down,. the ReFS system replicas.

After about a week, the ReFS datastore would become corrupt. Exact same issue as in the thread above, you can see that several of the users had to KEEP re-doing their datastores....insane.

I checked my datastore and it was FULLY healthy from a Block and drive level. Same as everyone else NOT a datastore issue. So I move all my machines from the protection group, and then, formatted my datastore, added the machines back to the protection group, and resync'd them all without issue. A terrible process.

A week later, same issue. Replicas slowed to a halt, Datastore formatted as ReFS by MABS as ReFS goes corrupt. Datastore still healthy. Not a Datastore issue. You can see from the thread above. All these users are using new versions of DPM or MABS all have enterprise grade datastores, of various brands, and they ALL experience this issue with newer versions of DPM or MABS (Once again MABS being the newer incarnation of DPM).

Round three, rinse and repeat, and as of this morning, I'm getting a BSOD, with the error, "System Threat Extension not handled, ReFS.sys not loaded." Which to me speaks to ReFS corrupt again. Once again, Datastore healthy from a Drive and block level, no errors detected, datastore reports as Healthy.

Before anyone says use NTFS, I cannot, once again ALL newer versions of DPM and MABS require ReFS, you CANNOT use NTFS. There is not option, ReFS is a requirement.

Does anyone have any suggestions for how to stop this issue? I can see from the locked threat above, that even up to 2022, this is still an issue with DPM and MABS, and it is acknowledged that Microsoft has yet to solve the issue. I can see that it doesn't seem to matter if MABS is using JBOD and Windows Storage space with Software RAID OR if an iSCSI datastore with hardware RAID is used, with both scenarios, most user's seem to be experiencing this issue. Worse yet, the thread above that is locked and unresolved, and I have seen newer threads up until Late 2022, with the same issue with newer versions of DPM and MABS. Before anyone says, "Well I used ReFS and I have not had this issue". It seems to Directly happen with MABS and ReFS and backup tasks. The reason I say this is that's what the threads show, and all Veeam Users who are using ReFS have the same issue. If you are using ReFS just for general filesystem storage it does NOT seem to be as much of an issue. However anything that writes and reads heavily that is backup related seems to be an issue. Once again Modern MABS, DPM and VEEAM users who are using ReFS report these same slowdown and corruption issues. I don't know what to do now. My new backup system was going to be MABS, but I cannot get a stable ReFS system, without ReFS becoming corrupt.

Does anyone have any suggestions?

Azure Backup
Azure Backup
An Azure backup service that provides built-in management at scale.
1,239 questions
{count} vote

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.