I'd like to pose a problem I'm looking at here, and have no real idea of a root cause. We're trying to run a backup from one server (node A) on one network over a firewall to backupserver (node B) in another network. And that started failing for whatever reason.
Long story short: ping traffic between node A and B dies, when node B tries to ping back to node A.
I was under the impression the firewall appliance was to blame (we replaced it with a newer model and thus had to rebuild the config), but testing seems to rule that out.
Basic understanding of the situation:
NODE A ---------------- FW ---------------- NODE B | | NODE C -- -- NODE D
All nodes are Windows machines. B needs to connect to A.
Nodes A and C are in the same network, and so are nodes B and D.
FW is the firewall appliance.
Pings below are all done based on IP address, so the DNS isn't a part in this.
When I initiate a perpetual ping from A to B, responses come normally.
If I ping from B to A this also works as expected, but at the point of the first response being received by machine B, the ping from A to B dies (request timed out). Only after a while of no ping there's apparently some 'reset' somewhere, and the ping from A to B is possible again. During this time the ping from B to A remains working.
I've disabled the Windows firewalls on both A an B, to no change in behavior.
As a test I'm also trying to ping from A to D. This works normally, and when pinging from D to A no change in the ping from A to D happens. Either side can ping the other without any issue. So the problem does not seem to be on the side of A.
Node C is a laptop, which I've tried the same with... But any ping to A, B or D works, and does not timeout if I try the ping from A, B or D to C (which also work as expected).
In order to further troubleshoot this, I moved C from the left side to the right side of the firewall, and set it up with the same IP address as B. Disconnected B from the network, plugged in C, and redid the test. Ping from A to C and C to A remains in working order either way. So that kind of narrows it down to an issue on node B, seeing the firewall and rules remain the same for the communication.
If I add a secondary IP to node B, the problem remains with the primary IP address, but the secondary IP address remains working as expected. When I change the IP of node B the whole problem would probably go away, but this would infer a lot of reconfiguration in the backup with replica-partners and source servers and the like... So if possible I'd like to leave the IP as-is.
Something on node B with regards to the IP stack seems to go wrong, but I can't really see anything I might be able to do to further narrow down the cause. Does anyone have any ideas here on what I might try or check to resolve this issue?