I have created this post or notes while searching to enable a Layer-3-only setup for OpenStack. I need to wrap up my knowledge and a fundamental explanation of some of the terms or problems for my colleagues.
I explained the cloud network from a fundamental approach, which may not even be true for some aspects like this can not be done, etc. What I mean by that kind of information is that it may lead a very complex technics or even more complex operations, which could be more usable from the technical point of view or operational point of view.
I intend to explore dynamic routing for OpenStack to build Layer-3-only underlay, which I will cover in more detail in the following post.
Why Overlay Networks
There can be two different operational working models for virtual environment customers, similar to any seller-buyer product. The first model involves administrators creating resources for customers, while the second model allows customers to develop resources either free or from a pre-defined list. As network resources are part of virtualization, customers can freely create their own networks or choose from pre-defined networks already made by the administrator.
Network virtualization is frequently used interchangeably with overlay networks, but they differ. Network virtualization involves building a logical network over physical resources, as with all types of virtualization. Let’s now understand overlay networks.
Overlay networking involves building your own network over another without the underlying network being aware. A typical example of this is the VPN services used by companies. When you connect to your company network over the Internet, the VPN service builds a logical tunnel between your computer and the company network.
This is almost true cloud networking. Cloud operator builds a platform and gives flexibility to the customers for creating their own networks.
In data center networks, a single server is insufficient, so a large group of network devices is used to create the underlying network, called the underlay. The cloud platform builds an overlay network for each customer over this underlay. However, sometimes there needs to be clarity between overlay networks and network segmentation, which separates customer network traffic from each other. Let’s explain this with a couple of examples.
Overlay networks are generally needed for IPv4 networks. The problem of IPv4 resource exhaustion is introduced by using private IPv4 addresses, which are not routable on the Internet as they are Private. They are used nearly everywhere, including home networks, campus networks, company networks, etc.
Since those private networks do not belong to anyone, the same prefix can be used in different companies or home networks as long as they do not need to be interconnected. This is the case when different companies use the same IP address space and want to connect their networks, and using NAT solves these problems.
From a data center point of view, cloud platforms face challenges in providing dedicated public IPv4 addresses due to their depletion. As a result, they often use private address spaces. However, it is welcome if a customer can afford to allocate public IP addresses to every VM.
Here comes the problem of self-service networks. Let’s look at the picture assuming all customers can create networks and assign any prefix from a private IP address space. Also, we have a network that allows servers to communicate within our data center network, which may also use prefixes from a Private IP address space.
In addition to addressing overlapping prefixes, cloud platforms need to segregate customer traffic to prevent different customer networks from communicating with each other. This can be solved using IP rules, access control lists, etc. However, overlay networks can also solve this problem, eliminating the need for complex access lists between customers.
In addition to addressing overlapping prefixes, cloud platforms need to segregate customer traffic to prevent different customer networks from communicating with each other. This can be solved using IP rules, access control lists, etc. However, overlay networks can also solve this problem, eliminating the need for complex access lists between customers.
By building an overlay network for each customer, cloud platforms can allow customers to use the same IPv4 private address space, prevent customer networks from communicating with each other unless authorized by the cloud platform, and enable customer networks to be built on any data center network, even through servers connected over the Internet. This leads to a very technical term for the cloud network, which is an external network.
How to create networks for cloud users
The creation of virtual networks depends on the mechanism used on the cloud servers. For example, OpenStack cloud uses an Open vSwitch and OpenFlow-based solution. Additionally, it’s essential to understand the concept of using pre-defined networks and the path that involves using Layer-2 between data center switches and cloud servers. Let’s explain these concepts with some examples.
The first type of network that can be used is a VLAN network, defined on the data center network equipment. The data center network is responsible for building those VLAN networks, and the default gateway of these VLANs may be a router, firewall, etc. Different customers can use the same VLAN, or different VLANs can be used for other customers. However, traffic isolation between customers can be challenging. They can be put in different zones on a firewall, or different routers can be used for each VLAN. The main point is that these networks must be pre-defined by the data center operator before they are used by customers. All traffic filtering can be done by the cloud platform. The cloud platform can isolate customer traffic with Layer 3/4 filters applied to the customer’s VM port. In that case, we do not have a firewall for this purpose.
OpenStack refers to these as provider networks. Those networks are provided to the OpenStack by the data center administrators, and they configure VLAN-ID, prefix, default gateway, etc., for those networks and give this information to the cloud administrators.
The first problem comes into play from scaling for Data Center Network.
- Large Layer-2 domains cause high BUM traffic.
- Centralized routing for east-west traffic.
- Switch hardware has limited resources. They have a limited number of Layer-2 records. Using more advanced technologies like EVPN, etc., Evey needs unique physical resources that affect the scaling.
This will be covered later, and let’s skip those issues and continue.
The other one, as mentioned, is to use overlay networks built by the cloud platform.
These networks can be VLANs, VXLANs, or any other type of network. Customers can use the same VLAN ID, prefix, etc., and they can create multiple networks for their projects. These types of networks are called self-service networks.
So, what happens when a customer wants to access the Internet or needs to connect to their different networks?
For provider-type networks, this is simple. They can be routed through a firewall, router, load balancer, etc., and then NAT’d to the Internet. There can be a more complex setup using provider networks where all VLANs can use the same prefix, which requires VRFs on the data center network.
For self-service networks, the cloud platform has to route this traffic to an external network.
The solution to this problem is introducing router functionality to the customers.
Customers can create networks and connect via routers; different customer networks can connect to the same router. If they use the same router, which means sharing a pre-defined router, they cannot use the same prefix, and they have to choose from a pool of prefixes for their project. Traffic isolation of different customers at the router generates another problem that needs to be solved.
How do we connect those routers to the external world, the Internet, etc.? Undoublty using another network, a provider network:
Again, with the same principle for self-service network routers, different routers can use the same external network. One of the main problems of using overlay networks is that the data center network needs to learn about the customer network, making it difficult to route traffic. By the way, I just remembered to mention distributed routing. Let’s change the picture into a centralized one.