50% of all connections between onprem and Azure via Express Route fails.
In Azure we have deployed some Linux VMs to same VNET in which ExpressRoute Gateway exists without any NSGs or UDRs. All resources are deployed in Switzerland North. Express Route is connected to Zurich.
If we run a PsPing from an onprem VM to an Azure VM on Port 22, every 2nd connection we see SYN sending but no SYN/ACK comes back. For the other 50% connection all works fine. (Visible in Wireshark)
We found, that also a ping from always the same onprem VM to Azure VMs only works for all pair target IPs and fails to all impair target IPs.
An onprem tracert to any of the Azure IPs shows for all working Ping IPs 2 Hops (local router/firewall and IP-Address assigned to Express Route Circuit). For all non-working Ping-IPs, we only see next hop and no further Hops.
Connections from Azure VM to onprem, we always see incoming request on onprem VM (in wireshark) and we see response/reply will be sent but 50% of these replys get lost.
We are sure, all packets leave Switch to which Express Route is connected.
Express Route Provider (in our Case Digital Realty) conformed, they do not have any Layer3 components in between our Switch and Microsoft Router.
Behavior looks like a layer 3 routing issue.
We tested also with one Express Route link disabled (to avoid asymmetric routing), but have still the same behavior.
We disabled IPv4 peering on Express Route and reenabled it, but still the same issue.
We then enabled ICMP from onprem VMs to Circuit IPs. We see the same issue. from an onprem VM with pair IP, we can ping Circuit IP, from an onprem VM with an impair IP, we do net get a reply.
From Router/Firewall, which is in the same subnet, as the Circuit IP, ping works.
How can we fix this and make our Express Route fully working?