Arp timeout and MAC address table consistency on EVPN Data center

Problem:

  • Huge BGP updates, although we do not have vmotion.
  • Loss of connective for some monitoring system that uses network scanning. ILO management programs of server Vendors.
  • Sometimes we lost the first 1-3 packets destined to a target host while using fast scanning applications.

What We Have Found:

We narrowed down the problem to the networks with very silent hosts, physical server interfaces, or infrastructure networks.

We saw that we hit the problem of default arp and MAC address table timeout. When a MAC removes from the table, the switch sends an EVPN update. Usually, ARP timeout is much more than MAC address table timeout for all switch vendors. If we decrease the arp timeout less than the MAC address table, the timeout will cause the switch to send Unicast ARP Request for the host at a random (math varies according to the vendor) when it’s last seen value reaches timeout value. It may be a second ago or for long arp timeouts minutes ago.
When the host response back, its MAC aging time resets and does not timeout, and the switch does not send a BGP update.

When switch resets its arp timer:

ARP is control plane process and generally it is maintained by the switch cpu, control plane processes. Like in linux, where ARP Packets processes by kernel, switched handle this process by CPU or control plane.

ARP tables are hold on the control planes. Generally its known that whenever switch receives a packet from the host it resets its arp timer, which is wrong. Switch needs a packet destined for its control plane to reset hosts timer. This may a ICMP packet destined to the switch IP address, generally host pinging its default gateway. Or any control plane process like, ARP Requests e.t.c that needs to be interrupted by the switch control plane. But the data traffic of the host never handled by the control plane, or CPU processes in modern switches. Data packets are hardware forwarded.

This is also true for linux kernel. ARP process is not responsible the arp table. Upper layer protocols chose to refresh the timer. For example ICMP will not refresh the destined hosts arp timer which causes kernel to send ARP requests during pinging. ICMP is not interested in if target hosts Mac is changed. It only interested in layer 3 information. However TCP will refresh the timer. If host Mac is changed this means host is changed, as TCP is state-full connection should be dropped).

Becarefull with the timers on the switch while lowering ARP timeout will help silent host problem but it should not be too low. It must allow switch control plane, usually ARP Requests from the hosts within the interval. This will eliminate the unnesscary arp requests for the live hosts.

 

http://www.embeddedlinux.org.cn/linux_net/0596002556/understandlni-CHP-28-SECT-9.html

 

http://www.embeddedlinux.org.cn/linux_net/0596002556/understandlni-CHP-28-SECT-5.html#understandlni-CHP-28-SECT-5