Why

Anycast services networks are advertised from multiple hosts to the data center. This enables multiple next hops for the given service, which is the advertised prefix. We may achieve load balancing, better failover, and scalability.

Solution

In order for all hosts to receive traffic, we must use ECMP (Equal Cost Multipath) for the anycast prefix. This depends on the type of fabric and the protocols used to run the data center.

EVPN & VXLAN Based:

Add/Receive Multiple Paths for EVPN address family: Anycast services require not only the best path but also advertising and receiving multiple paths for the same NLRI (Network Layer Reachability Information).
Maximum Path configuration not only for L2 EVPN address family but also on the edge switches unicast address family where hosts are connected.
As-Path ignore/relax for best path calculation depending on your configurations.
External routes are carried with EVPN route-type 5. Routes originated from the same Leaf switch require some additional tuning as they will look the same when encapsulated into route-type 5. Setting the gateway IP to the next-hop, then the host advertising the prefix may resolve the issue. But this time, you cannot use RFC5549 between the switch and the host as the gateway IP must be IPv4. We cannot use RFC5549 for anycast services. Additional BGP tunings may be required depending on the EBGP or IBGP-based configurations. BGP best path selection algorithms should be tuned for the solution.

Routed/IP fabric: Mostly, we are talking about BGP-based setups. Depending on whether EBGP or IBGP is used, problems change and may even not be possible.

- - Receiving multiple paths from the same neighborship: Spine and Leaf form EBGP neighborship. When a leaf forwards multiple paths for the same destination, a leaf that is connected to more than one anycast-advertising host, the Spine will only install a single path among them.

LEAF BGP Table – It has 4 multi paths which are locally connected.

Leaf is advertising all them to the Spine

Spine receives all them but only one them is selected for multipath

As a result all leafs will have at least a single route for the remote next hops for any cast prefix but will loose the weight of them as only one route will be selected by the spine for each Leaf’s route.

You may try IBGP but this time you will hit EBGP over IBGP selection for the edge networks. Even you may try BGP PIC commands but all them may result in very complex setups and you may require to use vendor specific configuration to overcome the problem.

We should use different address family rather than ipv4 unicast like vpnv4 or evpn for advertising the edge networks for supporting any cast services. Pure IPv4 fabric will not help us for any cast services inside the fabric. Only if you have dedicated switches for any cast services, for N-S traffic for example, like firewall, external load balancers, and etc which are equally connected to multiple switches ( they will connected to same switches with same number of links) which will result in ECMP equal to the number of switches, where each switch will do its own ECMP towards the hosts.

Why

Anycast is used for horizontal scaling, even for better failure response. With the help of modern orchestration and virtualization systems, engineered or even the system itself can instantly spawn a new instance of the service.

Anycast can be an option for this kind of scaling for network-level services like DNS, load balancer, etc., but it requires some features depending on the protocols used in network configuration.

Solution

For modern data centers, a service or VM can be spawned on any of the hosts. This dynamic nature of the service also requires dynamic routing at the network level. We need to establish dynamic routing between the data center and the servers. BGP seems like a more optimal solution with better deployment scenarios for anycasting.

Using RFC5549 (Advertising IPv4 Network Layer Reachability Information with an IPv6 Next Hop ) which is now RFC8950

RFC5549 or RFC8950 is a point-to-point technique. It uses ICMPv6 NA and RA messages to find out the remote end router.

Flag Details

Which means you can only establish point-to-point relations. Discovery of the neighbor is based on IPv6 LL addresses, and you may not even need IPv4 enabled interfaces on those links.

We can use RFC5549 between the hosts and also for building the underlay. The type of the underlay effects route distribution or BGP address families and may introduce some knobs to the solution.

First analyse the use of RFC5549 between host and leaf for two types of fabric;

EVPN & VXLAN fabric

The routes that we learned trough EBGP from the hosts will carried as route-type 5. We need to focus on the encapsulation of those routes into route-type 5.

Normally, the gateway address points out all zeros. Even if we use IPv6 to advertise the IPv4 next hop, this will not change anything from the EVPN perspective. The gateway IP will be all zeros, same mechanism.

The problem arises when we try to advertise a VIP (Virtual IP) from hosts, for example, a load balancer service, DNS, or any kind of virtual network service. VIPs require ECMP (Equal Cost Multipath) in the Fabric. For EVPN VXLAN underlay, if we want to use distributed routing and symmetric IRB (Integrated Routing and Bridging), it generally requires us to use VRF (Virtual Routing and Forwarding) for the edge networks. Let’s focus on the ECMP problems inside the routing table of the VRF.

From the above example let’s focus on leaf-3. It has a server connected and advertising prefix A. Also there remote servers connected to leaf-1 and leaf-2 and also they advertising the same prefix.

Leaf-1 and Leaf-2 has multiple servers advertising the same prefix. They must advertise all of this path to the EVPN. The leafs with multiple servers connected need to install all path into their routing table. This will require BGP multiple path (maximum paths) and also they have to advertise not only their best but also all of the local paths to the other leafs. Other wise Leaf-3 will only have 2 remote routes instead of 5 (3 from leaf-1 and 2 from leaf-2). There couple of problems need to be addressed;
- Each Leaf need to install multiple paths (maximum path)
- Each Leaf need to ignore as-path or may need relaxing according to host AS usage.
- Each Leaf need to advertise all paths available via route-type 5. If all routes has same AS and same VRF, there should be some mechanism to make them different. The only left area is Gateway IP address section which is all zero under normal circumstances. We may set this value to the remote BGP peer addresses. This way we have different BGP paths for the same prefix even the AS number and VRF are the same.
- Additional tuning may require according to the vendor of choice like local pref setting etc for advertising only local routes and etc.
Leaf-3 must install all remote routes and local routes also. You may need to ignore as-path as local paths will have shorter as path. On my tests with Cisco Nexus, NX-OSv version 9.3(11), local leaf for example Leaf-3 chose the remote prefixes as best path and advertise those paths back towards the spine which are dropped due to AS loop. It seems NXOSv chose best path’s attributes even the AS-Path for constructing the multi paths with all same as. This can be seen from remote leafs. Local routes all seem same on the remote leafs. I have set the local preference of the local routes in order to avoid this.

Leaf-1 as a host advertising the VIP with different AS, as 20 and chose this local as best path.And all the leaf-1 routes seem coming from the AS20 on the remote leafs.

Now setting the gateway-ip: It’s available with the command export gateway-ip for Cisco NXOS but as we are using IPv6 for EBGP with the hosts, the next hop is IPv6. How will the switch encode this IPv6 address into route-type 5 IPv4 gateway section. This depends on the vendor but there is no solution for this problem. This means you can not use RFC5549/RFC8950 for advertising IPv4 VIP trough the data center as all local routes in a single LEAF will have the same gateway-ip for those routes and only one them will be advertised to the EVPN side. Even so receiving leaf will drop the route as it may have invalid gateway.

For example We have configured couple of sessions with a leaf;

Now we have 6 local paths. 2 of them via IPv6 next hops. When we look at the details of those routes we can see the weird Gateway IP;

Another interesting point with using RFC5549 is for the underlay.

Leaf-1 config:

router bgp 65001
router-id 10.34.0.1
log-neighbor-changes
address-family ipv4 unicast
redistribute direct route-map d2bgp
address-family l2vpn evpn
maximum-paths 64
additional-paths send
additional-paths receive
additional-paths selection route-map pass_all
neighbor 10.33.1.1
remote-as 65201
description spine
address-family ipv4 unicast
send-community
send-community extended
address-family l2vpn evpn
send-community extended
neighbor 10.101.5.1
remote-as 20
description HOST
address-family ipv4 unicast
neighbor 10.101.0.0/16
remote-as 10
description HOST
address-family ipv4 unicast
neighbor Ethernet1/1
remote-as 65201
description spine
address-family ipv4 unicast
address-family l2vpn evpn
send-community extended
vrf BLUE_TENANT
bestpath as-path ignore
address-family ipv4 unicast
redistribute direct route-map permit_all
export-gateway-ip
maximum-paths mixed 64
additional-paths receive
additional-paths selection route-map pass_all
neighbor 10.101.5.1
remote-as 20
description HOST
address-family ipv4 unicast
neighbor 10.101.0.0/16
remote-as 10
description HOST
address-family ipv4 unicast

I have configured RFC5549 with leaf-2 and spine

Leaf-2 invalidates the route for invalid rhp resolve error!

Leaf-2 config:

router bgp 65002
router-id 10.34.0.2
log-neighbor-changes
address-family ipv4 unicast
redistribute direct route-map d2bgp
address-family l2vpn evpn
maximum-paths 64
additional-paths send
additional-paths receive
additional-paths selection route-map pass_all
neighbor 10.33.2.1
remote-as 65201
description spine
shutdown
address-family ipv4 unicast
send-community
send-community extended
address-family l2vpn evpn
send-community extended
neighbor 10.102.0.0/16
remote-as 10
description HOST
address-family ipv4 unicast
neighbor Ethernet1/1
remote-as 65201
description spine
address-family ipv4 unicast
address-family l2vpn evpn
send-community extended
vrf BLUE_TENANT
address-family ipv4 unicast
redistribute direct route-map permit_all
export-gateway-ip
additional-paths receive
neighbor 10.102.0.0/16
remote-as 10
description HOST
address-family ipv4 unicast

When I changed the leaf-2 and spine BGP session to normal IPv4 from RFC5549 Leaf-2 validates the route!

Leaf-2 config:

router bgp 65002
router-id 10.34.0.2
log-neighbor-changes
address-family ipv4 unicast
redistribute direct route-map d2bgp
address-family l2vpn evpn
maximum-paths 64
additional-paths send
additional-paths receive
additional-paths selection route-map pass_all
neighbor 10.33.2.1
remote-as 65201
description spine
address-family ipv4 unicast
send-community
send-community extended
address-family l2vpn evpn
send-community extended
neighbor 10.102.0.0/16
remote-as 10
description HOST
address-family ipv4 unicast
neighbor Ethernet1/1
remote-as 65201
description spine
shutdown
address-family ipv4 unicast
address-family l2vpn evpn
send-community extended
vrf BLUE_TENANT
address-family ipv4 unicast
redistribute direct route-map permit_all
export-gateway-ip
additional-paths receive
neighbor 10.102.0.0/16
remote-as 10
description HOST
address-family ipv4 unicast

Proposed configurations:

We can not use RFC5549. I can not find any solution why the leaf can not resolve the nexthop when using export gateway-ip. I have tried to advertise the nexthop /32 route from the Leaf-1 ( adding network nh/32 under BGP) but it do not help.
We can not use RFC5549 for underlay with export-gateway.

Solution with the use of EVPN with all configuration;

Leaf config ( Leaf-1 but all leafs will have similar configuration):

F1-R1-L1# sh run

nv overlay evpn
feature bgp
feature isis
feature interface-vlan
feature vn-segment-vlan-based
feature nv overlay

fabric forwarding anycast-gateway-mac 0000.2222.3333
vlan 1,10,1010
vlan 10
vn-segment 900010
vlan 1010
vn-segment 800010

ip prefix-list d2bgp seq 5 permit 10.34.0.1/32
ip prefix-list d2bgp seq 10 permit 10.35.0.1/32
route-map d2bgp permit 10
match ip address prefix-list d2bgp
route-map d2bgp deny 100
route-map pass_all permit 10
set path-selection all advertise
route-map test_in permit 10
set local-preference 250 -> this is required for selecting local prefixes for best which will used for contracting multi paths attributes. Much more related with NXOS.
vrf context BLUE_TENANT
vni 800010
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
route-target import 65000:800010
route-target import 65000:800010 evpn
route-target export 65000:800010
route-target export 65000:800010 evpn
vrf context management
ip route 0.0.0.0/0 192.168.122.1

interface Vlan1

interface Vlan10
no shutdown
vrf member BLUE_TENANT
ip address 10.10.10.1/24
fabric forwarding mode anycast-gateway

interface Vlan1010
no shutdown
vrf member BLUE_TENANT
ip forward

interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback1
member vni 800010 associate-vrf
member vni 900010
ingress-replication protocol bgp

interface Ethernet1/1
no switchport
ip address 10.33.1.0/31
ipv6 link-local use-bia
isis network point-to-point
isis circuit-type level-2
ip router isis underlay
no shutdown

interface Ethernet1/2

interface Ethernet1/3

interface Ethernet1/4

interface Ethernet1/5
no switchport
vrf member BLUE_TENANT
ip address 10.101.5.0/31
ipv6 link-local use-bia
no shutdown

interface Ethernet1/6
no switchport
vrf member BLUE_TENANT
ip address 10.101.6.0/31
no shutdown

interface Ethernet1/7
no switchport
vrf member BLUE_TENANT
ip address 10.101.7.0/31
no shutdown

interface Ethernet1/8
no switchport
vrf member BLUE_TENANT
ip address 10.101.8.0/31
ipv6 link-local use-bia
no shutdown

interface loopback0
ip address 10.34.0.1/32
isis network point-to-point
isis circuit-type level-2
ip router isis underlay

interface loopback1
ip address 10.35.0.1/32
isis network point-to-point
isis circuit-type level-2
ip router isis underlay

router isis underlay
net 49.0000.0000.0001.00
is-type level-2

router bgp 65001
router-id 10.34.0.1
address-family ipv4 unicast
redistribute direct route-map d2bgp
maximum-paths 64
additional-paths send
additional-paths receive
additional-paths selection route-map pass_all
address-family l2vpn evpn
maximum-paths 64 this required for installing multiple EVPN routes
additional-paths send this required for sending multiple EVPN routes
additional-paths receive this required for receiving multiple EVPN routes
additional-paths selection route-map pass_all this required for selecting multiple EVPN routes for advertising
template peer-policy host
route-map test_in in
template peer-session host
remote-as 10
neighbor 10.33.1.1
remote-as 65201
address-family ipv4 unicast
disable-peer-as-check
soft-reconfiguration inbound always
address-family l2vpn evpn
disable-peer-as-check
send-community extended
vrf BLUE_TENANT
bestpath as-path ignore -> routes from the remote leafs and local ones will have different AS path length.
address-family ipv4 unicast
redistribute direct route-map pass_all -> as we are using export-gateway-ip our routes will have remote BGP peer address as next hop. Thus we need to advertise our BGP peer ip addresses.
export-gateway-ip -> differentiating local routes while creating EVPN NLRI
maximum-paths mixed 64 -> mixing local and remote prefixes for multipath
additional-paths send -> sending multipath to the EVPN
additional-paths receive -> receving multipath from hosts
additional-paths selection route-map pass_all
neighbor 10.101.5.1
remote-as 20
address-family ipv4 unicast
route-map test_in in
neighbor 10.101.0.0/16
inherit peer-session host
address-family ipv4 unicast
inherit peer-policy host 1
neighbor Ethernet1/5
remote-as 20
address-family ipv4 unicast
route-map test_in in
neighbor Ethernet1/8
inherit peer-session host
address-family ipv4 unicast
inherit peer-policy host 1
evpn
vni 900010 l2
route-target import auto
route-target export auto

logging monitor 7

Spine:

F1-S1# sh run

router isis underlay
net 49.0000.0000.0201.00
is-type level-2

router bgp 65201
router-id 10.34.0.201
bestpath as-path ignore or relaxing the as-path. At this level all routes will be from leafs and same as path length.
address-family ipv4 unicast
maximum-paths 64
retain route-target all
additional-paths send
additional-paths receive
additional-paths selection route-map add_select -> select the prefixes for additional path
neighbor 10.33.1.0
remote-as 65001
address-family ipv4 unicast
soft-reconfiguration inbound always
address-family l2vpn evpn
send-community extended
neighbor 10.33.2.0
remote-as 65002
address-family ipv4 unicast
address-family l2vpn evpn
send-community extended
neighbor 10.33.5.0
remote-as 65003
address-family ipv4 unicast
address-family l2vpn evpn
send-community extended

As a result it is not yet feasible to use BGP unnumbered with hosts or for underlay for VNF services. But it works fine with normal IPv4 based EBGP sessions with hosts.

Category: Networking

ESI Multihoming with EBGP only underlay on Nexus

Anycast inside the Data Centers

Why

Solution

Anycast in EVPN & VXLAN fabric

Why

Solution

Using RFC5549 (Advertising IPv4 Network Layer Reachability Information with an IPv6 Next Hop ) which is now RFC8950

EVPN & VXLAN fabric

Proposed configurations: