I escalated HPE and got an answer! They first provided a link to a Microsoft KB, but Microsoft have removed this to update it and republish later (I don't know when). Here is the information I got from HPE L3 Engineer (the answer was in French, hope the translation is clear) : "posted an update on case SIE268522, update #57, on January 9, 2024, to indicate that Microsoft replaced the two KB articles with old learning articles without informing HPE. We then found out from Microsoft that they removed these articles because they wanted to update their content. The root cause of the issue is that in WS2019 with Hyper-V enabled, on EPYC Gen1/2 processors, the root/host OS would indicate ssbd support in the global speculation data structure. However, on EPYC Gen3/4 processors, the root/host OS would not indicate ssbd support in the global speculation data structure. The fix is in WS2022. Once Microsoft KB articles are available again, we will notify the customer." So according to HPE, themselves according to Microsoft, migrating to Server 2022 should solve this issue. I haven't migrated (my Hypervisors are still in 2019), so I can't confirm yet this solution. But I'm happy as my investigations were going on that direction, I was more than conviced that processor capabilities exposure was part of problem :) I hope this will help!
LiveMigration failure on Hyper-V cluster
I have a tricky one here, and I hope someone will have an idea about what is going on.
I was hired a few month ago in a company that has a 4 servers Hyper-V cluster.
Servers have been bought separately, let's say server 1 and 2 first, and two years after server 3 and server 4.All are the same model from HPE, only processors generation changes as two servers are older.Hosts are under Server 2019, VMs are mixed between 2016 and 2019, and all of them are up to date regarding microsoft monthly security patches.
Before I arrived, LiveMigration was working perfectly, but between the time previous admin left and my arrival, it started to fail frequently for some VMs, specifically when draining a node to patch it.
Role raises a 21502 error (Migration failure for virtual computer XXXX to destination host YYYY) when trying to live migrate, but with no additional information.
Investigating deeper, I noticed that if VM has been started from older hosts, it can be liveMigrated to every one of them.
If VM has been started from on of the newer hosts, it can be live migrated only to newer hosts.
Running Compare-VM to one of the older hosts reveals a 21026 ID incompatibility, but doesn't give anymore clues about what is causing it.
Processors compatibility is enabled for all VMs, I checked that all networks are declared the same way in each host, I even compared processors options in each host's UEFI settings but I don't have any clues for what is causing this issue.
I followed this guide from Microsoft, but no luck either : https://learn.microsoft.com/en-us/troubleshoot/windows-server/virtualization/troubleshoot-live-migration-issues
Anyone has ideas on investigation trails?
3 answers
Sort by: Most helpful
-
-
Amit Singh 4,901 Reputation points
2023-03-22T10:14:04+00:00 Check if the anti-virus on the host computer is causing the problem. Exclude it from doing 'on-access' scanning of the CSV.
-
Kristian Halvorsen 0 Reputation points
2023-11-29T10:06:03.24+00:00 Hello,
Did you ever find a solution? I have the EXACT same problem..