Introduction
For a long while, we’ve known that forwarding traffic using traditional kernel networking stack such as the Linux/BSD one won’t cut it if you’re intending to do high-performance packet forwarding.
As such, from looking into solutions, it turns out that Intel’s DPDK framework for fast x86 forwarding has received considerable work since the last time I looked at it.
For now, it looks like we have solutions from 6WIND, as well as various other open-source projects such as the Linux foundation’s FD.IO. However, it looks like performance numbers are scarce when it comes to real-world implementations, so I sought to seek some performance numbers out of such a setup.
Therefore, this post aims to evaluate the performance of such a solution, compared to a traditional Linux 4.18 kernel’s forwarding performance.
Hardware Overview
This test makes use of three nodes with Intel X520-DA2 and Mellanox ConnectX-2 cards connected via 10GBASE-SR transceivers at a nominal distance (<5m for the link between each device) as described below:
Node 1:
Intel Xeon E3-1230V3 (iperf client node)
32GB DDR3 1866Mhz
Debian Linux 9
Mellanox ConnectX-2
Assigned IP: 192.168.0.0
Node 2:
Intel Xeon E3-1230V3 (router running VPP+FRR)
32GB DDR3 1866MHz
Debian Linux 9
Intel X520-DA2
Assigned IP: 192.168.0.1
Node 3:
AMD Ryzen 2600X (iperf server node)
32GB DDR4 3200MHz
Debian Linux 9
Intel X520-DA2
Assigned IP: 192.168.0.2
MTU’s are left at 1500 as it is not common to increase MTU in networks where the whole control may not belong to a customer. Keeping a small MTU also keeps the test more interesting as it has been described that 9000 MTU is able to hit 10Gbit/s with little problem.
Configuring the test setup
The FRR wiki has a detailed guide on configuring FRR to work with VPP: https://github.com/FRRouting/frr/wiki/Alternate-forwarding-planes:-VPP
As such, we refer to the above-mentioned guide to set up our system. However, one key difference is that we do not make use of Vagrant in our setup. Everything is configured on the host node’s OS itself.
For our BGP configuration, we make use of a simple BGP configuration between nodes 1, 2 and 3:
Node 1:
router bgp 64600
network 192.168.0.0/32
neighbor 192.168.0.1 remote-as 64601
Node 2:
router bgp 64601
network 192.168.0.1/32
neighbor 192.168.0.0 remote-as 64600
neighbor 192.168.0.2 remote-as 64602
Node 3:
router bgp 64602
network 192.168.0.2/32
neighbor 192.168.0.1 remote-as 64601
And then it’s time to set up the throughput testing on nodes 1 and 3.
We disable ipv4.ip_forward in the Linux kernel on node 2 to ensure that the Linux networking stack is not being used in this testing.
Test Results and Methodology
We make use of iperf3 to do 5 parallel tests via TCP to confirm our throughput. Next, we use UDP and gradually increase bandwidth size to test for dropped packets.
For the TCP test, all default options are used with no modifications to iperf’s buffer length or window size.
For the UDP tests, only the UDP packet size is manipulated. We start off with the default packet size of 1470 gradually reduced to 512 and 64.
Test 1 (TCP, 5 threads):
[SUM] 0.00 – 10.00 sec 10.9 GBytes 9.34 Gbits/sec
Test 2 (UDP 1470 bytes, 5 threads):
[SUM] 0.00 – 10.00 sec 10.6 GBytes 9.28 Gbits/sec
Test 3 (UDP 512 bytes, 5 threads):
[SUM] 0.00 – 10.00 sec 9.8 GBytes 9.02 Gbits/sec
Hmm… let us try increasing the number of threads.
Test 4 (UDP 512 bytes, 10 threads):
[SUM] 0.00 – 10.00 sec 10.3 GBytes 9.22 Gbits/sec
Test 5 (UDP 64 bytes, 5 threads):
[SUM] 0.00 – 10.00 sec 9.6 GBytes 8.95 Gbits/sec
Test 6 (UDP 64 bytes, 10 threads):
[SUM] 0.00 – 10.00 sec 10.0 GBytes 8.96 Gbit/s
Looks promising. Let us test the above with a baseline of the Linux kernel forwarding performance so we can make the comparison.
Linux Baseline Test Results
Test 1 (TCP, 5 threads):
[SUM] 0.00 – 10.00 sec 5.8 GBytes 4.27 Gbits/sec
Test 2 (UDP 1470 bytes, 5 threads):
[SUM] 0.00 – 10.00 sec 7.12 GBytes 6.42 Gbits/sec
Test 3 (UDP 512 bytes, 5 threads):
[SUM] 0.00 – 10.00 sec 3.54 GBytes 2.88 Gbits/sec
Test 4 (UDP 512 bytes, 10 threads):
[SUM] 0.00 – 10.00 sec 4.62 GBytes 3.16 Gbits/sec
Test 5 (UDP 64 bytes, 5 threads):
[SUM] 0.00 – 10.00 sec 2.17 GBytes 1.82 Gbits/sec
Test 6 (UDP 64 bytes, 10 threads):
[SUM] 0.00 – 10.00 sec 2.98 GBytes 2.01 Gbit/s
As we can see, using VPP greatly increases the packet forwarding performance, and possibly it is doing very close to line rate, and could possibly go even further. Unfortunately, I do not have the hardware to test > 10G at the moment and may require a substantial upgrade to the nodes that are generating iperf packets as well before any testing beyond 10G can be done.
Closer
This is my first time doing performance testing of this sort. As such, any feedback is welcome as to how my test methodology could be improved.