Virtual WAN VPN Gateway goes down after editing routing labels for some vnet's.

LukeCloudWalker-6128 36 Reputation points
2022-05-28T08:04:49.42+00:00

Hi guys,

We have a weird behaviour with our vWan setup.

I have a VPN configured (this is only production site connected to the vWan at the moment) between a CheckPoint and the VPN Gateway. BGP is configured.

I have a few VNET's connected to the HUB, and connectivity is fine when all goes through the DefaultRouteTable.

I am now making some tests for isolation purpose, all VNET's are supposed to share the same routeTable so i decided to test the labels.

So basically the setup is :
VPN is by design associated to the DefaultRouteTable.
VNET 1 is associated to its own RouteTable labeled VNET1
VNET 2 is associated to its own RouteTable labeled VNET2

Scenario :
I configure VNET1 connection to propagate routes to Default label
I configure VNET2 connection to propagate routes to Default label
I configure Default route table to propagate to labels VNET1 and VNET2

=> If i check connectivity, i can reach interfaces on both VNET1 and VNET2 from onPremises.

Problem :
If i remove the Default label from the propagation for VNET2 connection. I lose connectivity for both VNET's.
The reason is because the VPN Gateway is somehow reconfigured and BGP goes down for both gateways.

Logs are showing the following :
206340-vpnlogs.png

The expected behaviour would be to lose connectivty between onPrem and VNET2, not the whole VPN going nuts.

There must be something i'm missing, but i dont get it.

Anyone experienced this before ?

Azure Virtual WAN
Azure Virtual WAN
An Azure virtual networking service that provides optimized and automated branch-to-branch connectivity.
186 questions
Azure VPN Gateway
Azure VPN Gateway
An Azure service that enables the connection of on-premises networks to Azure through site-to-site virtual private networks.
1,368 questions
Azure Virtual Network
Azure Virtual Network
An Azure networking service that is used to provision private networks and optionally to connect to on-premises datacenters.
2,132 questions
0 comments No comments
{count} votes

5 answers

Sort by: Most helpful
  1. risolis 8,701 Reputation points
    2022-05-28T20:13:23.46+00:00

    Hello @LukeCloudWalker-6128

    Thanks for your post.

    I would like to clarify one statement which is the one below:

    editing routing labels for some vnet's.

    Are you editing the labels or editing the route table? Cause you need to delete it and recreate it for this scenario If I am not mistaken.

    Also, I want to understand a little bit the intended routing effect that you want to see.... Because you use a route-filter on-premises and avoid having Vnet2 route prefix on your VPN tunnel(I do not know if this tunnel is over internet or EXPRESSROUTE circuit)

    Looking forward to your feedback,

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

    0 comments No comments

  2. LukeCloudWalker-6128 36 Reputation points
    2022-05-28T20:37:14.76+00:00

    Ola @risolis ,

    Thanks for taking the time.

    I'm not editing the labels or the route table directly.

    My first idea to manage routing across vpn/er/vnets was creating for each vnet a route table, with one label only.
    And then play with the "propagate to labels" property of each of the vnet connections.

    What is disrupting the VPN gateway and trigger the logs i mentioned in my first post, is when i simply remove (or add) the default label to the VNET1 connection propagate property, and the VPN gateway flaps...

    NB : This is a VPN over internet. We're planning on rerouting our onPrem traffic via ER and maybe keep the VPN as a backup for priority traffic.


  3. LukeCloudWalker-6128 36 Reputation points
    2022-05-28T23:54:29.02+00:00

    nat is not used at all from the network adressing plan perspective.
    i am not aware of having nat feature enabled or disabled somewhere on the vwan.

    the.1 IP is indeed our onprem vip for the tunnels on on a checkpoint cluster. which i dont manage, but the network team followed instructions from the checkpoint kb

    sorry for the mistakes, writing from phone via vdi... a pain

    ill try and provide a topology, which is quite simple. on this vwan configuration


  4. risolis 8,701 Reputation points
    2022-05-31T06:06:44.797+00:00

    Hello @LukeCloudWalker-6128

    Thanks a lot for your explanation as well as all the details given before.

    I would like to start from the peering output as it is shown below:

    == Peers ==

    "Local address","Peer address","Gateway instance","ASN", "Status", "Connected duration", "Routes received","Messages sent","Messages received"
    "10.0.0.12", "169.254.21.1", "Instance0", "64514", "Connected", "0:17:18", "44", "155", "125"

    "10.0.0.13", "169.254.21.1", "Instance1", "64514", "Connected", "0:17:14", "44", "157", "126"

    Based on the output above, On your virtual gateway hub(Virtual Wan) there are 2 instances which are Instance0("10.0.0.12") and Instance1("10.0.0.13") peering to the same peer address which is 169.254.21.1(which is your floating IP or Virtual IP on your Checkpoint cluster).

    So, for me it is like to have just one tunnel because the best practice should be a separate tunnel IP addresses per each instance but you can have just one as your intended network design.

    Having said that, we can take a look at the next explanation below from public documentation:

    An Azure Virtual WAN connection is composed of 2 tunnels. A Virtual WAN VPN gateway is deployed in a virtual hub in active-active mode, which implies that there are separate tunnels from on-premises devices terminating on separate instances. This is the recommendation for all users. However, if the user chooses to only have 1 tunnel to one of the Virtual WAN VPN gateway instances, if for any reason (maintenance, patches etc.) the gateway instance is taken offline, the tunnel will be moved to the secondary active instance and the user may experience a reconnect. BGP sessions won't move across instances

    That is the first observation but let me continue with the other ones :)

    -On your network topology I could see that you have sub-project1-prd,sub-project1-dev and sub-infra-prd.

    so the question that I have is... Are they a separate subcription or resource groups located on differents regions?

    -I have seen that there are some default routes with different origin or source peer address which are "10.0.0.68" and "10.0.0.69"....

    So, Do they belong to your on-premises IP allocation?

    -If you want to keep the current design... I am wondering if you have applied a "next-hop-self" policy on your Checkpoint FW under BGP protocol. Could you confirm it pls?

    Looking forward to your feedback,

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.


  5. LukeCloudWalker-6128 36 Reputation points
    2022-07-13T12:26:53.573+00:00

    Hi there,

    Thanks for providing assistance, especially to @risolis .

    I had a feedback from the support.

    It is by design like this. No joke...

    This would be enough to reconsider the vWan architecture and go back to hub and spoke with NVA instead of vWAN + AzureFirewall.