Network Load Balancing Troubleshooting

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2, Windows Server 2012

Troubleshooting

What problem are you having?

  • After installing Network Load Balancing and restarting a cluster host, a message appears: "The system has detected an IP address conflict with another system on the network..."

  • There is no response when using ping to access the cluster's IP address from an outside network.

  • There is no response when using ping to access a host's dedicated IP addresses from another cluster host.

  • When attempting to use Network Load Balancing Manager to connect to a host in your cluster, you receive the error "Host unreachable".

  • When using Telnet or attempting to browse a computer outside the cluster from a cluster host, there is no response.

  • When invoking the Network Load Balancing remote control commands from a computer outside the cluster, there is no response from one or more cluster hosts.

  • When using the dedicated IP address of a host to specify it as a target for a remote control command, there is no reply. However, specifying the host by its priority (ID) works.

  • Connectivity to the cluster is denied to some, but not all users.

  • You cannot view or change the Network Load Balancing properties using net config and Windows Management Instrumentation (WMI).

  • An unusual number of TCP connections to the cluster's IP address are being reset by either the server or the client.

  • Virtual Private Network (VPN) calls fail when you make a change that causes convergence (such as adding a host, removing a host, or draining a host).

  • After the cluster hosts start, they begin converging but never complete convergence.

  • The cluster moves in and out of a converged state.

  • After the cluster hosts start, Network Load Balancing reports that convergence has finished, but more than one host is a default host.

  • Network Load Balancing is not load balancing applications, and the default host handles all network traffic.

  • Traffic alternates unexpectedly between the cluster hosts breaking TCP connections.

  • Network traffic does not appear to load balance evenly among the cluster hosts.

  • When you are using Network Load Balancing with Microsoft Internet Security and Acceleration (ISA) Server, one cluster host might log blocked packets that are directed to the dedicated Internet Protocol (IP) address of another host.

  • You are unable to create a Network Load Balancing cluster in a 64-bit environment.

After installing Network Load Balancing and restarting a cluster host, a message appears: "The system has detected an IP address conflict with another system on the network..."

Possible Cause

The same IP address exists somewhere else on the network.

Solution

Choose a new IP address, or remove the duplicate address.

Possible Cause

You have configured different cluster operation modes (Unicast and Multicast) on the hosts causing two different MAC addresses to map to the same IP address.

Solution

Ensure that all hosts are configured with the same cluster operation mode. For more information see Enable Network Load Balancing.

Possible Cause

You configured the cluster's IP address before Network Load Balancing was bound to the network adapter.

Solution

Remove the cluster's IP address from TCP/IP properties, enable Network Load Balancing on the proper adapter as described in Enable Network Load Balancing and then configure the cluster's IP address.

Possible Cause

You added the cluster's IP address to a network adapter that has not been enabled for Network Load Balancing.

Solution

Remove the cluster's IP address from incorrect adapter's TCP/IP properties, enable Network Load Balancing on the proper adapter as described in Enable Network Load Balancing and then configure the cluster's IP address.

Possible Cause

You added the cluster's IP address in the TCP/IP dialog box, but you did not add the IP address correctly using the Network Load Balancing dialog box.

Solution

If you do not use Network Load Balancing Manager to configure your cluster, you must manually configure TCP/IP with the cluster's IP address. For more information, see Set up TCP/IP for Network Load Balancing.

There is no response when using ping to access the cluster's IP address from an outside network.

Verify that you can use ping to access the dedicated IP addresses for the cluster hosts from a computer outside the router. If this test fails and you are using multiple network adapters, the problem is unrelated to Network Load Balancing. If you are using a single network adapter for both the dedicated and cluster IP addresses, consider the following causes:

Possible Cause

You did not add the cluster's IP address correctly in the TCP/IP properties.

Solution

If you do not use Network Load Balancing Manager to configure your cluster, you must manually configure TCP/IP with the cluster's IP address. For more information, see Set up TCP/IP for Network Load Balancing

Possible Cause

If you are using multicast support, you might find that your router has difficulty resolving the primary IP address into a multicast media access control (MAC) address using the ARP protocol.

Solution

To check this, verify that you can use ping to access the cluster from a client on the cluster's subnet and to access the cluster hosts' dedicated IP addresses from a computer outside the router. If these tests work properly, the router is probably at fault. You should be able to add a static ARP entry to the router to circumvent the problem. You can also turn off Network Load Balancing multicast support and use a unicast network address without a hub.

Possible Cause

When using Network Load Balancing in either multicast or unicast mode, routers need to be able to accept proxy ARP responses (IP-to-network address mappings that are received with a different network source address in the Ethernet frame).

Solution

Make sure that your router has proxy ARP support turned on. You can also set a static ARP entry to keep proxy ARP support disabled in the router.

Possible Cause

Internet control message protocol (ICMP) to the cluster is blocked by a router or firewall.

Solution

Allow ICMP traffic through the firewall or router. Be aware that this may expose your system to additional security risk.

There is no response when using ping to access a host's dedicated IP addresses from another cluster host.

Possible Cause

You did not add the host's dedicated IP addresses correctly in TCP/IP properties.

Solution

You must manually configure TCP/IP with the host's dedicated IP address. For more information, see Set up TCP/IP for Network Load Balancing

Possible Cause

When using Network Load Balancing in either multicast or unicast mode, routers need to be able to accept proxy ARP responses (IP-to-network address mappings that are received with a different network source address in the Ethernet frame).

Solution

Make sure that your router has proxy ARP support turned on. You can also set a static ARP entry to keep proxy ARP support disabled in the router.

Possible Cause

Internet control message protocol (ICMP) to the cluster is blocked by a router or firewall.

Solution

Allow ICMP traffic through the firewall or router. Be aware that this may expose your system to additional security risk.

When attempting to use Network Load Balancing Manager to connect to a host in your cluster, you receive the error "Host unreachable".

Cause:  Internet control message protocol (ICMP) to the host is either blocked by a router or firewall, or disabled on the host's network adapter.

Solution:  Enable ICMP on the host's network adapter or allow ICMP traffic through the firewall or router. Be aware that this may expose your system to additional security risk. You can also use Network Load Balancing Manager's /noping option. For more information, see Nlbmgr.

When using Telnet or attempting to browse a computer outside the cluster from a cluster host, there is no response.

Cause:  Verify that you can use ping to access the computer outside the cluster. If this test is successful you might not have listed the host's dedicated IP address first in the TCP/IP properties.

Solution:  When you must manually configure TCP/IP with the host's dedicated IP addresses and the cluster's IP address, the dedicated IP addresses must be listed first. For more information, see Set up TCP/IP for Network Load Balancing.

If ping fails to access the computer outside of the cluster refer to the problems labeled "There is no response when using ping to access the cluster's IP address from an outside network" and "There is no response when using ping to access a host's dedicated IP addresses from another cluster host" described earlier in this troubleshooting information.

When invoking the Network Load Balancing remote control commands from a computer outside the cluster, there is no response from one or more cluster hosts.

Possible Cause

Remote control commands are not being sent to the cluster's IP address. Commands must be sent to the cluster's primary IP address that was assigned in the Network Load Balancing Properties dialog box.

Solution

Be sure that you send remote commands to the correct IP address.

Possible Cause

The remote control traffic is being encrypted by Internet Protocol security (IPSec). Network Load Balancing remote control commands will not work correctly if they are sent from a computer that has IPSec configured such that the remote control traffic is encrypted by IPSec.

Solution

Disable IPSec.

For more information, see Internet Protocol Security (IPSec).

Possible Cause

Remote control is not enabled.

Solution

Enable remote control in the Network Load Balancing Properties dialog box.

For more information, see Configure cluster parameters.

Caution

  • The Network Load Balancing remote control option presents many security risks, including the possibility of data tampering, denial of service and information disclosure. It is highly recommended that you do not enable remote control and instead use Network Load Balancing Manager or other remote management tools such as Windows Management Instrumentation (WMI).

    Firewall blocking remote control commands If you choose to enable remote control, it is vital that you restrict access by specifying a strong remote control password. It is also imperative that you use a firewall to protect the Network Load Balancing UDP control ports (the ports receiving remote control commands) in order to shield them from outside intrusion. By default, these are ports 1717 and 2504 at the cluster's IP address. Use remote control only from a secure, trusted computer within your firewall. For more information on the remote control parameter, see Remote control in Network Load Balancing parameters. For more information about strong passwords, see Strong passwords.

Possible Cause

Network Load Balancing UDP control ports are protected incorrectly by a firewall. By default, remote control commands are sent to UDP ports 1717 and 2504 at the cluster IP address.

Solution

Be sure that these ports have not been blocked incorrectly by a router or firewall. You can also change the port number by modifying the corresponding Network Load Balancing parameter.

Possible Cause

You used an incorrect password when attempting to use remote control.

Solution

Use the remote control password that you configured when you enabled the Network Load Balancing cluster.

When using the dedicated IP address of a host to specify it as a target for a remote control command, there is no reply. However, specifying the host by its priority (ID) works.

Cause:  None of the hosts have a dedicated IP addresses.

Solution:  Assign a dedicated IP address to each host. For more information, see Configure host parameters.

Connectivity to the cluster is denied to some, but not all users.

Possible Cause

An application being load balanced in not responding.

Solution

This is an application specific problem not related to Network Load Balancing. Refer to your application's documentation for correction. You may need to stop and restart the application.

Possible Cause

If your cluster is configured for unicast mode, a switch might have learned the Network Load Balancing network adapter's MAC address.

Solution

Clear the switch's port to MAC address mapping.

Possible Cause

The cluster's IP address was not added to TCP/IP on one or more of the hosts.

Solution

If you do not use Network Load Balancing Manager to configure your cluster, you must manually configure TCP/IP with the cluster's IP address. For more information, see Set up TCP/IP for Network Load Balancing.

Possible Cause

A host is leaving the cluster because of a drainstop or stop command, but convergence did not complete correctly.

Solution

Wait for convergence to complete. If convergence does not complete, see the problem titled "After the cluster hosts start, they begin converging but never complete convergence" later in this troubleshooting guide.

You cannot view or change the Network Load Balancing properties using net config and Windows Management Instrumentation (WMI).

Cause:  In order to view or change Network Load Balancing properties, you must be a member of the Administrators group.

Solution:  Log on as a user who is in the local Administrators group of the computer that is running Network Load Balancing.

An unusual number of TCP connections to the cluster's IP address are being reset by either the server or the client.

Possible Cause

HTTP keep-alives are enabled on the Network Load Balancing hosts and keep-alive enabled clients are connecting to the cluster.

Solution

Disable HTTP keep-alives. For more information on HTTP keep-alives and Internet Information Services (IIS), refer to the IIS documentation set. To view the IIS documentation set from your desktop, install IIS then click the Start button, click Run, and type the following command in the Open text box: %windir%\help\iisrv.chm.

Possible Cause

Low system resources on the server are causing TCP itself to reject the connections.

Solution

Free up system resources by for example, adding additional system memory or closing unnecessary applications.

Possible Cause

The cluster has diverged into two separately converged clusters, causing more than one node to claim ownership of every connection.

Solution

Remove the two clusters then recreate a single cluster.

Virtual Private Network (VPN) calls fail when you make a change that causes convergence (such as adding a host, removing a host, or draining a host).

Cause:  When using Network Load Balancing to load balance VPN traffic such as PPTP/GRE and IPSEC/L2TP, you must configure the port rules that govern the ports handling the VPN traffic (TCP port 1723 for PPTP and UDP port 500 for IPSEC) to use either Single or Class C affinity.

Solution:  Configure the port rules governing ports 500 and 1723 to use either Single or Class C affinity. For more information see Network Load Balancing parameters.

After the cluster hosts start, they begin converging but never complete convergence.

Possible Cause

Either a different number of port rules or incompatible port rules on different cluster hosts were entered. This will inhibit convergence.

Solution

Open the Network Load Balancing Properties dialog box on each cluster host and verify that all hosts have identical port rules.

Possible Cause

You have a bad network adapter or cable.

Solution

Use the ping command to test connectivity. Ping the host's fully qualified domain name. You can also learn more about the problem by pinging your domain controller by IP address, and by pinging other network servers by name and IP address.

Possible Cause

You have mismatched duplex settings on a switch or hub.

Solution

Confirm that duplex settings in each of your switches and hubs is configured appropriately.

Possible Cause

The dedicated IP address that you used for one of the hosts already exists somewhere else on the network.

Solution

Choose a new IP address, or remove the duplicate address.

Possible Cause

You defined one or more port rules that are applicable to only specific IP addresses, but your cluster contains hosts running Windows 2000.

Solution

If you use IP address specific port rules, your cluster must be running a product in the Windows Server 2003 family on all hosts.

Possible Cause

You have configured different cluster operation modes (Unicast and Multicast) on the hosts.

Solution

Ensure that all hosts are configured with the same cluster operation mode. For more information see Enable Network Load Balancing.

The cluster moves in and out of a converged state.

Cause:  Heartbeats are being missed due to intermittent network connectivity caused by a bad network adapter or cable, or other network problems

Solution:  Use the ping command to test connectivity. Ping the host's fully qualified domain name. You can also learn more about the problem by pinging your domain controller by IP address, and by pinging other network servers by name and IP address.

After the cluster hosts start, Network Load Balancing reports that convergence has finished, but more than one host is a default host.

Possible Cause

The cluster hosts have become members of different subnets, so that all hosts are not accessible on the same network.

Solution

Be sure that all cluster hosts can communicate with each other.

Possible Cause

A layer three switch is being used.

Solution

Put a layer two switch between the hosts and the layer three switch.

Possible Cause

A break in a redundant switch caused the cluster to separate into two separate clusters, creating two separate default hosts.

Solution

Remove the two clusters then recreate a single cluster.

Possible Cause

Your switch is configured to reject broadcast packets.

Solution

Configure your switch to accept broadcast packets (be aware that this might introduce certain security risks), or configure your Network Load Balancing cluster to use multicast mode.

Possible Cause

One host in unable to send or receive heartbeats.

Solution

Use the ping command to test connectivity to each of the hosts. Ping the hosts' fully qualified domain name.

Possible Cause

A host is plugged into the wrong port on the switch.

Solution

Use the correct port on the switch.

Network Load Balancing is not load balancing applications, and the default host handles all network traffic.

Possible Cause

A port rule is missing. By default, Network Load Balancing directs all incoming network traffic not governed by port rules to the default host. This ensures that the application you do not want load balanced behaves properly.

Solution

To load balance an application across the cluster, create a port rule on every cluster host for the TCP/IP port(s) serviced by the application.

Possible Cause

You added a second host to a single host cluster, but the second host is not configured correctly. The cluster never converges and the original host continues to handle all of the traffic.

Solution

Carefully review and if necessary correct each of the settings on the second host, for example cluster IP address, dedicated IP address and port rules.

Possible Cause

If your cluster is configured for unicast mode, a switch might have learned the Network Load Balancing network adapter's MAC address.

Solution

Clear the switch's port to MAC address mapping.

Possible Cause

A proxy server is sending all connections using a single IP address to your cluster in single affinity mode.

Solution

Configure your proxy server to use multiple IP addresses.

Traffic alternates unexpectedly between the cluster hosts breaking TCP connections.

Cause:  Unicast network addresses are causing problems with the switching hub. If you are using a switching hub to interconnect the cluster hosts, you must use Network Load Balancing multicast support; otherwise, the switch is likely to behave erratically when the same unicast network is used on multiple switch ports.

Solution:  Check that you have selected multicast support in the Network Load Balancing Properties dialog box. If you do not want to use multicast support, you can interconnect the cluster hosts with a hub or coaxial cable instead of a switch.

Network traffic does not appear to load balance evenly among the cluster hosts.

Cause:  The network traffic is coming from a limited number of IP addresses, possibly due to the use of a proxy server.

Solution:  Configure your proxy server to use multiple IP addresses.

Notes

  • These topics describe several common problems that you might encounter when installing and initially using Network Load Balancing. Each topic describes the likely reasons for each problem and one or more suggested remedies.

  • These topics assume that your system and applications meet the minimum requirements for Network Load Balancing. For more information, see Checklist: Enabling and configuring Network Load Balancingand Determining which applications to use with Network Load Balancing.

  • Many of the problems described here can be avoided, diagnosed and corrected by using Network Load Balancing Manager. For more information, see Network Load Balancing Manager.

  • For additional troubleshooting support, go the Microsoft Product Support Services Web site by clicking on the appropriate link at Support resources and performing a search on "Network Load Balancing", "Nlb", or "Windows Server 2003."

  • You should test your network and all network adapters for proper operation before installing Network Load Balancing. Be sure to follow all installation steps and check that the cluster parameters and port rules are identically set for all cluster hosts. If a problem occurs, always check the Windows event log for a message from the Network Load Balancing driver. For more information, see Cluster parameters, Host parameters, and Port rules in Network Load Balancing parameters.

When you are using Network Load Balancing with Microsoft Internet Security and Acceleration (ISA) Server, one cluster host might log blocked packets that are directed to the dedicated Internet Protocol (IP) address of another host.

Cause:  One of the cluster hosts is configured with a host priority identifier equal to 1.

Solution:  Do not configure any cluster host with a host priority identifier of 1. Instead, use numbers that are greater than 1. For more information, see Configure host parameters.

You are unable to create a Network Load Balancing cluster in a 64-bit environment.

Cause:  You might not be running the appropriate Network Load Balancing version for the environment you are using. Network Load Balancing cannot form a cluster when the 32-bit version of Network Load Balancing is used on a 64-bit computer. This issue might have gone undetected because 32-bit Network Load Balancing components (nlb.exe, wlbs.exe, and nlbmgr.exe) appear to run correctly in the 64-bit environment.

Solution:  If you plan to use a 64-bit environment, you must use the 64-bit Network Load Balancing version.