External Services for EVPN VXLAN

In this article I will investigate the external services for datacenter fabric where;

  • EVPN for control plane and VXLAN for data plane.
    Multi-tenancy configured with L3 VRF configuration.
    Symmetric any-cast gateway is configured for all tenant VLANS for better scalability. Distributed routing over the data center.

Installing external service is just about installing a type-5 route in the appropriate tenants routing table.
I will focus on installing a firewall for tenant network’s internet access. Firewall will announce the default route towards fabric switches. Physical connection and topology.

Dual firewall for redundancy : A single firewall cluster with active-passive setup.
Dual fabric edge device for redundancy : Two border leaf switches is used. EVPN ESI and multi chassis options can be used.

Lets first look at active-passive firewall setup. Both firewalls must have identical configurations. During fail-over passive firewall takes all control-plane and data-plane traffic. If there is a BGP session or any other layer 3 routing adjacency with the remote devices, those neighbor ships must move to the second device but most of the vendor do not replicate network control-plane traffic for OSPF – BGP for routing protocols to the passive device even its not possible with short hello timers. This result in session re-establishment for most of the network protocols.

As both firewalls have identical configurations, remote device has to have identical configuration from layer 3 perspective on its connected interfaces towards the both firewalls.

This requirement can only be done using logical interfaces for layer 3 configuration thus plain layer 3 interfaces can not be used. More simply routed interfaces can not be used as both interfaces on remote device as two interface can not have identical layer 3 setup.

page1image49951120 page1image49947168

Both ports have to be switched ports, and SVI interface is configured for establishing layer 3 adjacency with firewall.

page2image49943840

In that case after a fail-over, routing agencies can be re-established between the active firewall and remote device. Grace-full restart capability can be though however this setup can conflict with BFD usage which can help sub second failure detection.

Now if we want more redundancy, we can put a second remote device.

Its obvious that we have to use SVI interfaces on the remote device. But we have two options to connect with the remote devices;

Both interfaces on active firewall can be dedicated interface, different layer 3 configuration. This requires a second layer 3, routing adjacency over second interface even if you use only one remote device, border leaf.
Interface bundle can be used for two of the interfaces.

Using separate links towards the remote devices

page2image49942800

page3image49948832

From layer 3 perspective you can use second adjacency as back-up link with using as-prepending or other methods. But you can also use ECMP with installing both adjacency routes on firewall and also on remote devices which will utilize both links and during a fail-over will result in better convergence time as devices will not update their rib table as most of the firewalls do not have additional-path support. But from EVPN perspective, tenant routing instance has to install both routes from border-leafs which is default configuration for most of the implementations as Fabric tend to use ECMP on all the paths.

Using interface bundling

It is obvious that remote side of the firewall also have to configure bundling , with using multi chassis link aggregation, VPC, MLAG, CLAG or e.t.c. This solves the design from layer 2 perspective. But what about from layer 3. How we are going to handle two routing adjacency? How will be the traffic flow.

As we are using bundle-interface, traffic will be forwarded towards the both link using ECMP. This means firewall can send some traffic designated to border leaf 1 towards the border leaf 2. Than border leaf 2 will switch traffic to the border leaf 2-1 as destination layer 2 address of the packet is border leaf 1. This will happen while using a single path or even using both paths from border leafs for layer 3 ECMP purposes or fast convergence from layer 3 perspective. In that case both border leafs will receive the other border leaf’s destinated traffic.

Switching traffic to the remote border leaf require or cause some problems for different implemantation.

If EVPN ESI is used, the cross side traffic between border leafs are switch using VXLAN. As the traffic itself from the firewall is destinated towards a host on leaf its again switched to another vni. This requires vxlan decapsulation and encapsulation on receiving border leaf, VxLAN border gateway functionality.

page3image49949248

page4image49840752

If a multi chassis protocol is used traffic will be switch over peerlink towards the remote border leaf where it will be encapsulated towards a leaf. But this may a problem from vendor to vendor. For example for cumulus this traffic is considered as unknown unicast and may overhelm orphan hosts connected on border leafs.(https://www.redpill-linpro.com/techblog/2018/02/26/layer3-cumulus-mlag.html).

There is another draw back which is obvious traffic is transferring both border leafs adding more latency.

However there is a solution. The problem is that destination layer 2 address is different for both path received from border leafs. Using a same virtual-mac for both paths next-hop is the solution. This can be achieved with two method while one of them is problematic.

First one is two use virtual-ip address like used in anycast gateway implementation on EVPN fabrics. This feature requires using the same virtual-mac for anycast gateway IP address on both pairs. Using different ip addresses on both switches as a virtual-ip address or anycast gateway IP address for that SVI. This solves the traffic flow. Next-hop may be different on both firewall but dmac of the next-hops are same. Both border-leaf will think that traffic is destinated to them and switch the packet directly to the destinated leaf. This method may have problem even not the desired one as this configuration cause some problem for vendor implementation of anycast gateway. As same IP address is used for BGP IP address. Even BFD will fail.

More optimum solution is using different IP address on SVI but setting the next-hop advertised to the firewall to a virtual-ip address configured as anycast gateway on border leafs.

In that way as dmac of any packet from firewall to border leafs are handled locally as dmac is the virtual mac shared between them, identical to anycast gateway implementation.

During my tests this method gives like 200-300ns convergence. Using a dedicated link on firewall like the first method give like 600ns-1sec failover time. The first method includes updating L3 table installing the second ip route and then updating l2 table. On firewall side, if it supports pre-negotiation of LACP on the passive side this will improve convergence. If you do not have this support, when active firewall fails the passive will establish LACP from the scratch and this will dramatically effect your convergence time which is not possible for the solution 1, using dedicated links between firewall cluster and border leafs.

Also if you enable add-path support on the appropriate tenant networks, fail-over during border-leaf failure will much more faster. Below is sample configuration with Arista border leafs:
#ESI and layer 2 configuration which should be the same on both devices.

page4image50125936

vlan X
name DC_FIREWALL_VLAN trunk group DC_FIREWALL
!

interface Port-ChannelX
switchport mode trunk
switchport trunk group DC_FIREWALL !

evpn ethernet-segment !

identifier X route-target import X lacp system-id X
!

interface Ethernet # Firewall cluster interfaces which are same on all of them switchport mode trunk
switchport trunk group DC_WAN_EDGE
channel-group 13 mode active

!
ip virtual-router mac-address …. # Must be same on both border leaf pair. used for Mac address of virtual ip address

interface X
vrf DC_FIREWALL
ip address A.B.C.1/29 # .2 is the border-leaf two ip address on same SVI. ip virtual-router address A.B.C.3 # MUST be same on both border-leafs.
!

route-map DC_FW_OUT permit 10 set ip next-hop A.B.C.3
!

router bgp X
..underlay configuration
vrf FIREWALL # L3 VRF configuration, organize a route-target structure to access appropriate tenant vrf …neighbor A.B.C.4 ## firewall IP address on that SVI, enable graceful-restart and bfd for faster convergence enable the route-map DC_FW_OUT for setting next-hop of advertised routes

This bundling setup can also be applied any other external service. For example on a linux machine with bird for EBGP (or you can use static routing). You can use a dedicated L3 VRF for load balancers and than import load balancer networks into appropriate tenant networks. This traffic will be routed toward both border leafs. This setup is much more simpler with using BGP on load balancer. With a static routing be sure to set the nexthop of the route on the load balancer to the virtual IP address on the border leafs.

page5image49918560

with using EBGP you can use multiple load balancer to load balance incoming traffic towards load balancers internet service prefixes.

page6image49918352