Introduction to flannel (Detailed explanation of UTP, VXLAN, Host Gateway mode)

Introduction to flannel (Detailed explanation of UTP, VXLAN, Host Gateway mode)

Flannel can provide cross-node network services for containers. Its model uses a network for all containers in the cluster, and then divides a subnet from the network for each host. When flannel creates a network for the container on the host, it divides an IP from the subnet to the container. According to the kubernetes model, an IP is provided for each Pod, and the flannel model fits with it, and flannel is simple and easy to use.
Flannel is almost the earliest cross-node container communication solution. Many network plug-ins have its shadow, and other network solutions can be regarded as its improved version. When it comes to cross-node access of containers, there are mainly the following challenges:
Duplication of container IP addresses: Since container tools such as docker only use the network namespace of the Linux kernel to achieve network isolation, the container IP addresses on each point are where they belong. The node is automatically assigned. From a global perspective, this local address is like the house number of the cell. Once it gets a larger range, there is a possibility of duplication. In order to solve this problem, flannel designed a global network address allocation mechanism, and used etcd to store the relationship between network segments and nodes, and then flannel configures docker (or other container tools) on each node, only in the current allocation Select the container IP address in the node's network segment. This ensures the global uniqueness of the IP address allocation.
Container IP address routing problem: The unique IP address does not guarantee communication. There is also a problem, because usually the IP and Mac addresses of the virtual network are not recognized on the physical network, and all data packets are sent to the network in time. It was discarded because it could not be routed. One of the earliest methods used by flannel is the overlay network, which is actually a tunnel network. Later, flannel also developed several other processing methods, corresponding to several network modes of flannel.

In an overlay network, all packets sent to the network will be encapsulated with additional headers. These headers usually contain the IP address of the host itself, and these headers usually contain the IP address of the host itself, because only the host The IP address can be routed and retransmitted in the network. According to different packet methods, flannel provides two propagation methods: UDP and VXLAN. UDP packets use a header protocol defined by flannel. The data is packaged and unpacked in the user mode of Linux. Therefore, when the data enters the host, it needs to undergo two conversions from the kernel mode to the user mode. The VXLAN packet uses a standard protocol built into the Linux kernel, so although its packet structure is more complex than UDP, all data encapsulation and unpacking are done in the kernel, and the actual transmission rate is much faster than UDP mode. But when flannel was popular in the early days, around 2014, when the mainstream Linux kernel version was relatively low, there was no VXLAN kernel module, which made flannel a reputation for being slow. Overlay is a method to solve container network address routing.
Routing is another way to solve container network address routing, which is used in flannel's host-gateway mode. From the above we know that the container network cannot be routed because there is no routing information between the hosts, but flannel knows this message, so can an intuitive method tell the nodes on the network? In the host-gateway mode, flannel flushes the routing information of the container network to the routing table of the host through the agent running on each node, so that all hosts have routing information of the entire container network. The host-gateway method does not introduce additional packets and unpacks by overlay. It is completely an ordinary network routing mechanism, and the communication efficiency is almost the same as that of a bare metal direct connection. In fact, the performance of flannel's host-gateway mode is even better than that of calico. However, because flannel can only modify the routing table on the node, once there are other routing devices between the nodes, such as a Layer 3 router, the packet will be discarded by the routing device. In this way, the host-gateway mode can only be used for directly reachable Layer 2 networks, which are usually relatively small due to network storm problems. In recent years, there have also been some specialized equipment that can build a large-scale Layer 2 network (the "large Layer 2" network we often hear).

The design purpose of flannel is to re-plan the IP address usage rules for all nodes in the cluster, so that containers created by different node hosts in the cluster have unique and routable IP addresses, and allow containers belonging to different nodes to be directly Communication through the intranet IP. So how does the node know which IPs are available and which are not? Flannel uses the distributed collaboration function of etcd.
Flannel is divided into management plane and data plane in architecture. The management plane mainly contains an etcd, which is used to coordinate the network segments allocated by the containers on each node. The data plane runs a flanneld process on each node. Different from other network solutions, flannel uses a no server architecture, that is, there is no control node, which simplifies the deployment and operation and maintenance of flannel.
All flannel nodes in the cluster share a large container address segment. Once flanneld is started, it will observe etcd, learn from etcd the network segment information occupied by containers on other nodes, and then apply to etcd for the available IP address segment of the node, and The network segment and host IP are recorded in etcd.
After flannel allocates the available IP address range for each node through etcd, modify the startup parameters of docker. For example, -bip= limits the node container where it is located to obtain the IP range to ensure that the docker on each node will use different IP segment. It should be noted that this IP range is automatically assigned by flannel, and flannel ensures that they will not be duplicated through the records saved in the etcd service without manual intervention by the user.
The underlying implementation of flannel is essentially an overlay network (except for the host-gateway mode), which encapsulates the data packets of a certain protocol in another network protocol for routing and forwarding. The underlying implementations currently supported by flannel are: UDP, VXLAN, Alloc, host-gateway, AWS VPC, GCE routing.
The best performing one is host-gateway. Both AWS VPC and GCE routing require L2 network support. If there is no cloud service, the maintenance cost is usually quite high. Alloc only creates subnets for this machine, and there is no direct communication between subnets on multiple hosts.
Finally, how does flannel know the IP address of the host where the destination container is located when the packet is being packaged? Flannel observes etcd data, so when other nodes update the network segment and IP information to etcd, etcd perceives it. When forwarding network packets to containers on other hosts, it uses the IP of the host where the other host's container is located. Send the data to flanneld of the corresponding host, and then forward it to the destination container.

Install flannel using kubernetes

Since flannel has no distinction between master and slave, an agent is installed on each node, namely flanneld. We can use kubernetes' DaemonSet to deploy flannel to achieve the purpose of deploying an instance of flanneld on each node.
We run a flanneld process in each flannel Pod, and the flanneld configuration file is mounted to the/etc/kube-flannel/directory in the container in the form of a ConfigMap for flanneld to use. One of the key fields is the Backend.Type field, which indicates what mode the flannel uses. We will introduce the various backends of flannel in detail later.

Detailed explanation of flannel backend (Detailed explanation of various modes of flannel)

By starting a process called flannel on each node, flannel is responsible for dividing the subnet on each node and saving relevant configuration information (such as the subnet segment of each node, external IP, etc.) in etcd, and The specific network message forwarding is handed over to the backend for implementation.
flanneld can specify different backends for network communication through the configuration file at startup. Currently, there are three mature backends: UDP, VXLAN, and host-gateway. At present, VXLAN is a kind of backend implementation that is recommended by the official; host-gateway is generally used in scenarios with high network performance requirements, but requires the support of the basic network architecture; UDP is used for testing and generally older ones do not support VXLAN Linux kernel.

1. UDP

The UDP mode is relatively easy to understand. Gu first uses the backend mode of UDP to explain, and then extends to other modes. When using UDP mode, you need to specify Backend.Type as UDP in the flanneld configuration file, which can be achieved by directly modifying the flanneld ConfigMap.
When the UDP mode is used, the flanneld process will produce a tun device by opening/dev/net/tun when it is started (you need to understand the tun device details introduction article that can be read). The tun device can be simply understood as a mechanism for communication between the kernel network and the user space (application) provided by Linux, and the application can send and receive RAW IP packets by directly reading and writing the tun device.
After the flanneld process is started, through the ip addr command, it can be found that there will be an additional network interface of flannel0 on the node. Through ip -d link show flannel0, you can see that this is a tun device; and you can see that the MTU of the flannel0 network interface is 1472, which is 28 bytes less than the host's network interface eth0. Through netstate -ulnp | grep flanneld, we know that flanneld monitors port 8285.

Flannel UDP mode native communication practice:
Now there are three containers A and B, their IP addresses are, 4,0 100,4. Check the routing information on the host, you can find the following routing information

route -n U 0 0 0 docker0

When the container is sent to the container B of the same subnet, because the two are on the same subnet, the containers A and B are on the same host, and the containers A and B are also bridged on docker0, with the help of the bridge docker0, that is The direct communication between container A and container B on the same host can be realized.
Note: The earlier version of flannel directly uses the docker0 bridge created by Docker, and the later version of flannel will use CNI to create its own cni0 bridge. Whether it is docker0 or cni0, it is essentially a Linux bridge, and the function is the same.

Cross-host communication practice in flannel UDP mode:
Implementation process of container cross-host communication: Assume that there is container A10.244.1.96 on node A and container B on node B. At this time, container A sends an ICMP request message to container B. Let's analyze the whole process of ICMP message from container A to container B step by step
(there were pictures originally, but CSDN can only upload pictures no more than 5M. So The following flowchart can’t be uploaded either. Readers understand)

  1. The ICMP request message sent by container A is in the form of (source) -> (destination) after being encapsulated by IP. At this time, through the routing table inside container A, the IP should be sent to the gateway (cni0 bridge). The Ethernet frame format of the ICMP message at this time is:
    Mac header: src: Mac of container A, dst: node A cni0MAC
    IP header: src: IP of container A; dst: IP 10.244.2. of container B 194 ICMP message follows
  2. The destination address of the IP packet to cni0 is, which matches the first routing rule ( on node A. The kernel knows that the RAWIP packet should be sent to the flannel0 interface by checking the local routing table.
  3. The flannel0 interface is a tun device, and the RAW sent to the flannel0 interface
    IP packet (without MAC) will be received by the flanneld process. After the flanneld process receives the RAW IP packet
    , it will perform UDP packets on the original basis. The UDP packet form is :{Random port for system management}->
    Note: is the IP address of the host where the destination container of is located, flanneld can be obtained by querying the etcd container, and 8285 is the flanneld listening port.
  4. flanneld forwards the encapsulated UDP message through eth0. From here, it can be seen that before the network packet is sent through eth0, the UDP header (8 bytes) is added, and the IP header (20 bytes) is added for encapsulation. This is also the reason why the MTU of flannel0 is 28 bytes smaller than the MTU of eth0, which prevents the packetized Ethernet frame from exceeding the MTU of eth0 and being discarded when passing through eth0.
    The format of the ICMP Ethernet frame after the complete packet is:
    MAC header: source node A MAC; destination node B Mac
    IP header: source node A IP172.16.130.140; destination node B IP
    UDP header: source port system Random port; destination port 8285
    is followed by payload
  5. The network packet passes through the host network from node A to node B
  6. After host B receives the UDP packet, the Linux kernel passes the packet to flanneld that is monitoring through port number 8285.
  7. Flanneld running on node B unpacks the UDP packet and gets the RAW IP packet: ->
  8. The unpacked RAW IP packet matches the routing rule ( on host B, and the kernel knows to
    forward the RAW IP to the cni0 bridge by querying the local routing table . The format of the unpacked Ethernet frame at this time:
    Mac header : Mac of source node B cni0; Mac
    IP header of destination container B : source container AIP; destination container B IP
    followed by ICMP packets
  9. The cni0 bridge forwards the IP packet to container B connected to the bridge. At this point, the entire process ends. The return message will be returned in the same way as the above data flow

Throughout this process, the role of flannel is:

  • UDP packet unpacking
  • Dynamic update of the routing table on the node

The flannel UDP packet means that the original data is UDP-encapsulated by the flanneld of the node, and after being delivered to the destination node through the host network, it is restored to the original data packet by the flanneld on the other end. Flannel refreshes the routing table of this node according to etcd data, and finds the destination node that should be delivered when addressing through routing table information.
It can be seen that although container A and container B are not directly connected on the house network, they logically seem to be in the same three-layer network. This kind of upper-layer network is constructed based on the underlying network equipment through software-defined network technology such as flannel. Call it an overlay network.
The most obvious problem with the network transmission process realized through the backend of UDP is that the network data packet is first copied from the kernel to the user mode through the tun device, and then copied to the kernel by the user mode application. Only one network transmission is performed twice. The switching between user mode and kernel mode is obviously not efficient.


Before starting to discuss the flannel VXLAN mode, let's first understand the VXLAN technology, which can be learned in the previous article.
Flannel's use of VXLAN is relatively simple, because currently kubernetes only supports a single network, so there is only one VXLAN network on the three-layer network. Therefore, you will find that flannel will create a VXLAN network card named flannel.1 (clearly the rule is flannel.[VNI, the default VNI is 1]) on the nodes of the cluster. The Mac address of VTEP is not learned through multicast, but discovered through the watch Node of the API server. In contrast, the flannel0 created in UDP mode is a tun device.
You can view the configuration information of the VTEP device flannel.1 through ip -d link. You can see that the local IP configured by flannel.1 is the container network segment, and flanneld configures the Linux VXLAN default port 8472 for the VXLAN external UDP packet destination port. Port 4789 is not allocated by IANA.
There may be doubts. In UDP mode flanneld performs network packet and unpacking work, while in VXLAN mode the packet unpacking work is done by the kernel. So what is the role of flanneld at this time? With this question, let us briefly introduce how flannel's VXLAN mode works.
When flanneld starts, make sure that the VXLAN device already exists, if it does not exist, create it, and skip it if it exists. And report the information of the VTEP device to etcd. When a new node on the flannel network joins the cluster and registers with etcd, flanneld on each node learns the notification from etcd and executes the following process:

  1. Create a routing table in the node of the network segment to which the node belongs, mainly to route traffic in the Pod to the flannel.1 interface. route -n
    can view the relevant route information on the node.
  2. Add the IP of the love node and the static ARP cache of the VTEP device to the node, and arp-n can view the ARP information of the containers on other nodes that have been cached in the current node. Use the bridge fdb command to view the VXLAN forwarding table fdb on the node, and you can see that in the forwarding table, Mac is the Mac of the opposite container, and the IP is the external IP of the opposite VTEP. (The external IP address of flannel VTEP can be specified by flanneld's startup parameter -iface=eth0, if not specified, the IP corresponding to the network interface will be found according to the default gateway)

VXLAN mode data path: In VXLAN mode, the data flow chart of flannel cluster cross-node communication is as follows
(sorry, the picture is too large to be inserted)

  1. Same as UDP mode, the IP packet in container A is sent to cni0 through the routing table of container A.
  2. The packet arriving in cni0 is sent to through the routing table in host A, and the IP packet to should be delivered to the flannel.1 interface
  3. As a VTEP device, flannel.1 receives the message according to the VTEP configuration for packetization. 1. we know that belongs to node B through etcd, and we know the IP address of node B. Then through the forwarding table in node A, it knows the Mac corresponding to the VTEP of node B, and performs VXLAN packets according to the parameters (VNI, local IP, Port) set when the flannel.1 device is created.
  4. Through the network connection between host A and host B, the VXLAN packet reaches the network card of node B
  5. Via port 8472, the VXLAN packet is forwarded to the VTEP device flannel.1 for unpacking
  6. The unpacked IP packet matches the routing table ( in node B, and the kernel forwards the IP packet to cni0.
  7. cni0 forwards the IP packet to container B connected to cni0

In VXLAN mode, data is forwarded by the kernel, and flannel does not forward data, but only dynamically sets ARP and FDB entries.

The realization of VXLAN mode The realization of
flannel VXLAN mode has gone through three iterations

  1. The first version of flannel, L3Miss learning, was done by looking up the ARP table Mac. L2Miss learning, achieved by obtaining the external IP address on the VTEP

  2. The second version of flannel removes L3Miss learning. When the host is online, just add the corresponding ARP entry directly, without searching and learning.

  3. The latest version of flannel removes L2Miss learning and L3Miss learning, and its working mode is as follows:

    1) Create a VXLAN device and no longer listen to L2Miss and L3Miss events.
    2) Create a static ARP entry for the remote host.
    3) Create an FDB forwarding table entry, including the external IP of VTEP Mac and remote flannel.

The latest version of flannel completely removed the L2Miss and L3Miss methods, and changed it to a method of actively adding remote host routes to the subnet. At the same time, the VTEP and the bridge each assign a Layer 3 IP address. When the data packet reaches the destination host, it performs three-layer addressing internally, and the number of routes is linearly related to the number of hosts (not the number of containers). It is officially claimed that each host under the same VXLAN subnet corresponds to one routing table entry, one ARP table entry and one FDB table entry.

3. Host Gateway

Host Gateway is abbreviated as host-gw. From the name, it can be thought that this way is to realize cross-node network communication by using the host as a gateway. Similar to UDP and VXLAN modes, if you use host-gw mode, you need to set the .Type parameter in the backend of flannel to "host-gw".
The flannel network packet transmission process using host-gw Backend is shown in the figure
(for the same reason, the picture cannot be inserted, please understand)

  1. Same as UDP and VXLAN mode, reach cni0 through container A's routing table IP packet
  2. The IP packet arriving at cni0 matches the routing rule ( in host A, and the gateway is, which is host B, so the kernel sends the IP packet to host B
  3. The IP packet reaches eth1 of host B through the physical network
  4. The IP packet of eth1 arriving at host B matches the routing table ( on host B, and the IP packet is forwarded to cni0
  5. cni0 forwards the IP packet to container B connected to cni0

In the host-gw mode, the cross-node network communication between each node is realized through the routing table on the node, so the host computer where the communicating parties are located must be able to route directly. This requires that all nodes in the cluster in the flannel host-gw mode must be in the same network. This restriction makes the host-gw mode unsuitable for scenarios where the cluster is large and the nodes need to be segmented. Another limitation of host-gw is that as the scale of nodes in the cluster increases, flanneld maintains dynamic updates of thousands of routing tables on the host is also a lot of pressure, so in routing mode, the number of routing table rules It is an important factor that limits the scale of the network, and we will also discuss this topic when we are in Calico.
After adopting host-gw mode, flanneld's only role is to be responsible for the dynamic update of the routing table on the host.

4. Flannel and etcd

Flannel needs to use etcd to save network metadata. Readers may be concerned about what data flannel saves in etcd and in which key? During the evolution of the flannel version, the etcd storage path has changed many times, so searching the full text of flannel keywords in the etcd database is a more reliable way, as shown below

etcd get "" --prefix --key-only | grep -Ev "^$" | grep "flannel"

In addition to flannel, the keys of Calico and Canal's configuration data in etcd can be queried in the above way

flannel summary

Flannel configures the L3 overlay network, which creates a large internal intranet spanning each node in the cluster. In this overlay network, each node has a subnet that is used to allocate IP addresses internally. When configuring a Pod, the Docker bridge interface on each node will assign an address to each newly created container. Pods on the same host can use Docker to bridge communication, while Pods on different hosts will use flanneld to encapsulate their traffic in UDP packets so that they can be routed to the appropriate destination. Flannel has several different types of backends that can be used for encapsulation and routing. It has undergone technological evolution such as UDP, VXLAN, and Host-Gateway, and its efficiency has gradually improved. All this is inseparable from the design philosophy of Linux: Less is More.
Overall, flannel is a good choice for most users. From a management point of view, it provides a simple network model. Users can set up an environment suitable for most use cases as long as they have some basic knowledge. Compared with other solutions, flannel is relatively easy to install and configure. Many common kubernetes cluster deployment tools and many kubernetes distributions can install flannel by default. In the early days of learning kubernetes, using flannel is a safe and wise choice, until you start to need something that it cannot provide (such as the network policy in kubernetes.

(This article is a note made while studying the book "The Authoritative Guide to the kubernetes Network", because I just learned k8s, and the self-study method is to type it again, and then add some own understanding. So you will find that a lot of content is from the book The content is exactly the same. If this article is an infringement of this book, please understand and will be deleted after notification. If you quote or reprint, please indicate the source-thank you)

Reference :