Openstack Networking Options

We will be covering the networking options with OpenStack. I will also try to understand if it is possible or not to configure those networking options without using a Layer-2 network. 

First, the prerequisites;

  • We need to scale to a massive number of servers.
  • We need redundancy as much as possible.
  • We need an optimal path for our applications.

Thus servers will be connected to two top rack switches for redundancy and fail-over, and the data center network will be configured as Layer-2 free using EVPN, etc. 

Openstack ml2 plugins heavliy relies L2 adjency for advertising the local VM ip addresses. I will explain the basic networking options.

Provider Networks

 

Using OpenStack routers is meaningless for provider networks. And also, you can not use the OpenStack NAT service, although you can use DHCP to define a subnet range to be used by OpenStack. To access the Internet, any VM must have a public IP address which seems impossible. We assume all networks will have ip addresses from private ip address space. Thus we need a NAT service outside OpenStack.

Also, projects can use different networks, and networks can use the same ip addresses. If so, we need a VRF/VPN service to map each external service to a VRF. In this way, different external provider networks can use the same subnet.

We can use bonding or VLAN-based forwarding between the switches and server. From those options, only bonding requires attention. Bonding with different remote devices requires advanced vendor property technic like VPC, MLAG, EVPN-ESI, etc. But bonding has advantages from link fail-over. If something goes wrong with the switch, like an uplink failure, the switch can suspend the port, which allows the remote server to remove the port from the bonding. Without this LACP, the switch needs to shut down the port.

Can we remove Layer-2 between the servers and the switches? Instead, we must analyze whether we need to connect our virtual servers directly to the provider network.

As the VMs are attached to the provider network, we need to extend the network to the underlay (unless using a routed provider network which also needs Layer-2 between underlay or more complex mechanisms). We must use Layer-2 between the server and the switch.

Self-Service Networks

Self-Service Networks are the type of network each user can freely create for the project. Besides the prefix assigned, public, private, etc., a self-service network is an Overlay Network configured on the compute hosts. They are unknown outside the OpenStack.

 

The segment created for a self-service network is connected to every other compute host using tunneling technics like Geneve,vxlan, etc. Until now, as we have not used any external network between the OpenStack and underlay, we can move on using Layer 3 links between OpenStack servers and underlay. We just need ip connectivity between the ip addresses used as tunnel end points.

Users can create multiple self-service networks and connect them via routers. Routers made by OVN are flow rules, and they are not a process like FRR, BIRD, virtual routers like vIOS, etc.
The first approach uses a central node for routing between different networks, as there can be only one default gateway. Scaling issues can be easily seen, and this solution leads single point of failure. Still, we can use Layer-3 between OpenStack and Underlay.

Distributed Virtual Routers

Distributed routers are the solution to centralized routing. As we know, all the networks and all the hosts, and with some mac manipulation, it is possible to distribute router functionality to all nodes. Details of this design can be found on Assaf Muller’s site:

Distributed Virtual Routing – Overview and East/West Routing

DVR enables support for local switching between networks instead of using centralized router.

External access to and from Self-Service Networks

Routing is required between two networks, simply a router.
We need an external network between the switch and our logical router. This network will be our default exit point. Provider networks can be used for this purpose.

But there is one more question that needs to be addressed.

Do we need NAT?

As we are using private adresses, we need NAT service. You can use NAT, 1-1 NAT, or NAT overload with OpenStack routers. 

The first problem is NAT overload, which is called SNAT. All self-service networks are NAT’d to the router’s external network ip address, and this traffic must go to the central routing node.


For 1-1 NAT, generally, routers are configured with a pool of ip addresses from the external network. When a VM needs a floating ip address (1-1 NAT), the router assigns an available one from the pool. DVR can be used for this kind of traffic. The logical router instance configured on the same host with the VM using a floating ip address replies to the ARP request destinated to the floating ip. Thus we hit the same technic where Layer-2 methods are used.

Without Layer-2 external network, we need to find a way to advertise our floating ip as a host route to the underlay network towards the virtual router. We can also use the central routing node, which will not be scalable. 

Openstack has a BGP agent responsible for dynamic routing support but with limited scale and features. Generally its used for tenant network advertisement. Thus you can not use same prefix for different networks.

OpenStack creates address scope concept. Where operator pick prefix from a pool to avoid address conflicts. You can find more information in OpenStack docs. Suppose we want to use overlays in the OpenStack and do not want to extend those overlays in the data center. In that case, we must create those overlays in the data center network and interconnect with OpenStack networking.

just for the concept

OpenStack BGPVPN is used for this purpose. It interconnectes the data center network and openstack network (just for EVPN not MPLS or any other type of address-family). When you create a overlay in OpenStack you also need to create it on the data center.

From my point of view, you can also achive this with using provider networks and bind them to different VRF in the data center will be much more easier.

 

Removing Layer-2 between servers and switch

We can not use provider networks as using them with Layer-3-only links is nearly impossible. If we do not want BUM traffic between our server or VM, we can connect to each switch via Layer-2. They need to advertise VM’s ip addresses to the data center network (redistribute arp neighbors). This can even be used with redundant TOR switches without extending Layer-2 to other TOR switches. But without BUM:)?

Thus the remaining choice is to use self-service networks. We need to use a router between self-service networks and TOR switches, and this seems perfect; we must use a router between a self-service network and outside it. 

There are a couple of obstacles that we need to solve;

  • We can not use a centralized routing engine for obvious reasons. We need a distributed solution. 
  • We can not directly establish BGP neighborship with the logical router; even so, it is not scalable for TOR switches to establish a BGP neighborship with every router.

There is the neutron-ovn solution that addresses these issues;

  • https://github.com/luis5tb/bgp-agent
  • https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-in-depth-traffic-flow-inspection/
  • https://www.openvswitch.org/support/ovscon2020/slides/OVS-CONF-2020-OVN-WITH-DYNAMIC-ROUTING.pdf

The BGP agent residing on the host establishes BGP sessions with the remote switches over the connected interface. This can be a BGP unnumbered session, or you must find a robust method to address interfaces and BGP configuration. BGP unnumbered seems perfect for this purpose.

This BGP agent registers to necessary events like port (VM) create/delete and network create/delete events and announces that information to the remote switches. Now the remote network learns information about our networks. Although the current implementation only announces the floating IPs. Uses central routing, gateway node for tenant networks, which causes central routing for external traffic into our tenant networks.
The main problem is the server, the kernel routing before the OVS.

Only using provider networks may eliminate the kernel passing by directly attaching the OVS bridge to the TOR switches. We need to solve a couple of problems; as explained in the topology, Layer-2 remains in the TOR switches.

  • Routers external routing; We need an exit interface and destination next-hop for our DVR. This next-hop can be deployed active-passive with any next-hop redundancy protocol. From the fail-over perspective, using LACP between both TOR switches is better, which will require multi-chassis LACP support on switch operating systems.
  • Before proceeding with the BGP session establishment, we must understand the next-hop issue. By default, BGP will set the next-hop to its source BGP addresses unless configured with a different one. So we need to set the next hop of route advertisement related to the DVR router to an ip address that will be forwarded to that router. Receiving switch will send an ARP req to find out the Layer-2 address of the next-hop. All DVRs connected to the same switch will receive this request. If all the DVRs have same configuration then they will all send a reply. Thus we must program them with a unique ip address (may be programmed as an empty nat record) that will be used as BGP next-hop. So BGP agent needs to advertise an ip address from a pool consisting of the external network prefix and configure the local OVS instance to respond to those requests destinated for that ip address. When traffic recived by the OVN, it will follow OVN flow to destinated network.
  • Establishing a BGP session will be another problem. We need just one BGP session. This can be achieved by using another external network dedicated to that purpose. On the switches, we need dynamic BGP neighborship support.

We should not modify the ovn instead the work should be done neutron-ovn plugin. 

There is also another feature that I have already discover which is BGP floating IPs over l2 segmented network. This feature limits the Layer-2 domain to a single rack. Its even possible reuse the same ip address, same network in each RACK. But still lacks advertising the self-service network host routes. I am not sure the codeing side of the feature but I am guessing that the BGP nexthop process is somehow realted with floating ip assignment. If it should be connected to any port creation event than it can be easliy applied to self-service networks. Using the same approach can applied to OVN also.