Hyper-V with Virtual Fiber Channel Broken for Rocky Linux 8.8 and Newer Guest

DuaneBV 0 Reputation points
2023-06-21T18:57:22.9066667+00:00

Issue: When attempting to write to a mount and/or to format the device that represents a Fiber Channel LUN, there is a SCSI bus "panic" that results in a locked (D+) process thread for the operation (expected for an I/O thread) and a never ending dmesg output repeating strings matching this:

hv_storvsc <Device GUID>: tag#197 cmd 0x8a status: scsi 0x0 srb 0x0 hv 0xc0000001

A reboot is required to recover the filesystem, but repeats as soon as a write operation hits the LUN or mounted filesystem. Let me provide some background.

I have been running a Failover Cluster with Hyper-V for over a decade. I have both Windows and Linux guests. For some guests, I add the Virtual Fiber Channel Adapter to the VM configuration and I directly zone LUNs from my 3PAR storage to the VM. This allows me to easily migrate the VMs between clusters. This has been working well until a few weeks ago when Rocky Linux 8.8 was released. Consequently, the same issue arose with Rocky Linux 9.2 and was working fine on 8.7 and 9.1, but I'll stick with 8.8 for my discussion. After the upgrade to the Linux guest OS, the new kernel that causes the issue is 4.18.0-477.10.1.el8_8.x86_64. I can reboot the system, leaving all other updated files alone, choose the previous working kernel (4.18.0-425.19.2.el8_7.x86_64) and this issue no longer occurs. It is also still present in the next updated kernel 477.13.1 which came out since the 8.8 original release.

I tested and verified that the issue does not occur with the 3PAR LUNs zoned directly to bare metal with the same HBAs that are in use by the Hyper-V servers running on the same server hardware. I also verified that it does not occur if I zone a disk to the Hyper-V Cluster Hosts and assign the cluster disk to the VM role, but this limits my cross-cluster migrations. It only occurs through the Virtual Fiber Channel adapters that are used by Hyper-V. This is on a closed network where I cannot export actual logs, so I apologize for that, but I really need some help from the Microsoft side as this appears to be tied to the Hyper-V Virtual HBA communication with the 3PAR. Hopefully some of you have seen this or can confirm my findings.

3PAR SAN: 3PAR 8440

Hyper-V Host: Windows Server 2019 (OS Build: 17763.4499)

Physical Host: HP ProLiant BL460c Gen8 Blade Server

Physical HBA: QLogic QMH2572 8Gbps (Dual Port)

Hyper-V Role Configuration Version: 9.0

Linux Guest OS: Rocky Linux 8.8

Linux Guest Config: Secure Boot disabled, 4 cores, 8192 MB RAM, All Integration Services enabled, hyperv-* packages installed in the guest

Windows Server 2019
Windows Server 2019
A Microsoft server operating system that supports enterprise-level management updated to data storage.
4,078 questions
Hyper-V
Hyper-V
A Windows technology providing a hypervisor-based virtualization solution enabling customers to consolidate workloads onto a single server.
2,861 questions
0 comments No comments
{count} votes

4 answers

Sort by: Most helpful
  1. Limitless Technology 44,711 Reputation points
    2023-06-22T12:57:48.3933333+00:00

    Hello there,

    If you're experiencing issues with Hyper-V and Virtual Fiber Channel (vFC) when running Rocky Linux 8.8 or newer as a guest, it's possible that there may be compatibility or driver-related issues. Here are a few steps you can take to troubleshoot the problem:

    Ensure compatibility: Verify that your version of Hyper-V is compatible with Rocky Linux 8.8 or newer. Check the system requirements and supported guest operating systems for your specific version of Hyper-V.

    Update Hyper-V Integration Services: Ensure that the Hyper-V Integration Services are up to date on the Rocky Linux guest. Integration Services provide drivers and services for enhanced performance and functionality. Check for updates and install the latest Integration Services on the guest.

    Verify Virtual Fiber Channel configuration: Double-check the Virtual Fiber Channel configuration on the Hyper-V host and the guest. Ensure that the necessary virtual switches, virtual HBAs, and WWPNs (World Wide Port Names) are properly configured.

    Update Rocky Linux: Ensure that the Rocky Linux guest is running the latest updates and patches. Use the package manager (yum or dnf) to update the guest operating system to the latest available version.

    Install Hyper-V-specific drivers: Check if there are any Hyper-V-specific drivers available for Rocky Linux. These drivers are designed to improve compatibility and performance when running Rocky Linux as a guest on Hyper-V. Check the Rocky Linux documentation or community forums for any available Hyper-V drivers.

    Check for known issues: Search for known issues or incompatibilities between Rocky Linux and Hyper-V. Check the Rocky Linux release notes, forums, and community resources to see if there are any reported issues or workarounds related to Hyper-V and Virtual Fiber Channel.

    I used AI provided by ChatGPT to formulate part of this response. I have verified that the information is accurate before sharing it with you.

    Hope this resolves your Query !!

    --If the reply is helpful, please Upvote and Accept it as an answer--


  2. DuaneBV 0 Reputation points
    2023-07-05T15:27:52.3433333+00:00

    I have verified this issue with a vanilla RHEL 8.8 installation as well, so the issue is not directly with Rocky Linux, as I wouldn't expect anyway since it is bug-for-bug compatible. I really need an update to the kernel that addresses what has changed since 8.7 and 9.1 regarding the hv_storvsc and vmbus communication.

    0 comments No comments

  3. DuaneBV 0 Reputation points
    2023-07-11T19:45:18.06+00:00

    For those who may find this post, the cause behind my issue has been found. As I worked with a Microsoft engineer, it was discovered that the value for the max_sector_kb setting for the Virtual Fiber Channel SCSI devices was changed to a new default from 512 to 8192 for the virtual fiber channel devices in the kernels starting with RHEL/Rocky 8.8 and RHEL/Rocky 9.2. This causes the issue that I was observing where the SCSI device goes into an infinite loop attempting to write to the SAN. After changing the value back to 512, it works fine. This is true for both direct and multipath devices.

    Making this change on a running system does not persist across reboots. The value must be set prior to activity on the device. As a temporary workaround, I have used a setting in my multipath.conf file to set the max_sector_kb value for my SAN since all of my devices are multipathd. This is detailed here:

    How to set custom 'max_sectors_kb' option for devices under multipathd control? - Red Hat Customer Portal

    Microsoft is working to either provide a patch to the Hyper-V server or a patch to be submitted to RedHat for the SCSI Virtual Fiber Channel devices. That is yet to be determined. I'll post an update to that when I have one. Cheers.

    0 comments No comments

  4. DuaneBV 0 Reputation points
    2023-07-11T19:45:22.26+00:00

    For those who may find this post, the cause behind my issue has been found. As I worked with a Microsoft engineer, it was discovered that the value for the max_sectors_kb setting for the Virtual Fiber Channel SCSI devices was changed to a new default from 512 to 8192 for the virtual fiber channel devices in the kernels starting with RHEL/Rocky 8.8 and RHEL/Rocky 9.2. This causes the issue that I was observing where the SCSI device goes into an infinite loop attempting to write to the SAN. After changing the value back to 512, it works fine. This is true for both direct and multipath devices.

    Making this change on a running system does not persist across reboots. The value must be set prior to activity on the device. As a temporary workaround, I have used a setting in my multipath.conf file to set the max_sector_kb value for my SAN since all of my devices are multipathd. This is detailed here:

    How to set custom 'max_sectors_kb' option for devices under multipathd control? - Red Hat Customer Portal

    Microsoft is working to either provide a patch to the Hyper-V server or a patch to be submitted to RedHat for the SCSI Virtual Fiber Channel devices. That is yet to be determined. I'll post an update to that when I have one. Cheers.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.