Hyper-v (2022) iSCSI MPIO CSV settings for resiliency

Andy Summers 0 Reputation points
2024-04-24T09:37:18.0633333+00:00

Working/Production 2 node Server 2022 Hyper-v cluster with Qnap iscsi for CSV storage.

I only have limited number of 10Gb ports so configured a 2nd iSCSI connection using a 1Gb link to a different switch.

I've configured MPIO with weighted paths because I didn't want it failing over to the 1Gb and not moving back to the 10Gb. Given the 1Gb path a weight of 10,000.

I'm not sure the resiliency is working though, as the cluster sees brief issues of the CSV as it not being available.

Q1. in iSCSI I only have 1 Portal Group - this has 2 network portals, both with indexes of 0 - is this correct as some screenshots I see have "2 Portal Groups"? (Qnap and servers have 2 Links/IP addresses on different subnets via different network switches)

Q2. should I tick "Path Verify Enable" on the disk devices? (Currently not enabled.) Implications?

Q3. should I change any of the settings - eg increase Disk Timeout or PDORemovePeriod etc - I am not sure if increasing these will mean it will wait longer before using the 1Gb link?

Ultimately I want the iscsi to be as resilient as possible because I had a brief 10Gb network blip and Hyper-v corrupted the Disks of the VMs - so I guess increasing the time iscsi traffic is queued/retried is the goal, but I don't understand the impact of the iSCSI/MPIO settings.

Windows for business | Windows Client for IT Pros | Storage high availability | Virtualization and Hyper-V
Windows for business | Windows Server | User experience | Other
Windows for business | Windows Client for IT Pros | User experience | Other
{count} votes

2 answers

Sort by: Most helpful
  1. Alex Bykovskyi 2,241 Reputation points
    2024-04-24T19:36:29.5533333+00:00

    Hey,

    In case using mixed networks with iSCSI, we usually recommend going with Failover Only multipathing policy. In this case, 1 GB interface will only be used in case of 10Gb link failure.

    As for specific mulitpathing settings, you should check it with Storage vendor. Different vendors can have different recommendations. I haven't found anything recent from Qnap, so it might be better to contact their support.

    https://files.qnap.com/news/pressresource/product/How_to_connect_to_your_QNAP_Turbo_NAS_from_Windows_Server_2012_using_MPIO.pdf

    You can add storage redundancy to your configuration by using StarWind VSAN as a shared storage. VSAN will create replicated shared storage pool and share it via iSCSI with Hyper-V hosts. In this scenario, your setup will be able to tolerate storage failure. Might be helpful: https://www.starwindsoftware.com/best-practices/starwind-virtual-san-best-practices/

    Cheers,

    Alex Bykovskyi

    StarWind Software

    Note: Posts are provided “AS IS” without warranty of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

    0 comments No comments

  2. Anonymous
    2024-04-25T04:19:25.0133333+00:00

    Hi Andy,

    Good day!

    Q1. Having multiple portal groups isn't strictly necessary unless you have specific requirements for separating traffic or managing redundancy at the iSCSI initiator level. Since you're using multiple network portals on the same portal group with different IP addresses and subnets, it should be sufficient for redundancy.

    Q2. Enabling path verification can help detect and remove failed paths more quickly, improving overall resilience. It's generally a good idea to enable this feature, but it's essential to test it in your environment to ensure it behaves as expected without causing unnecessary disruptions.

    Q3. Increasing these values can indeed help prevent premature failovers to the 1Gb link during temporary network blips. However, you'll need to strike a balance between resilience and responsiveness. Longer timeouts can increase the time it takes to detect and react to actual failures, potentially impacting performance or causing delays in failover scenarios. It's recommended to adjust these settings incrementally while monitoring system performance and failover behavior to find the optimal balance.

    Best Regards,

    Ian Xue


    If the Answer is helpful, please click "Accept Answer" and upvote it.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.