Feb 24 2022

Going multipath without Multipath TCP

title card

Gigabit ethernet has been around for a long time, it’s so ubiquitous that there is a very strong chance that if you have a RJ-45 port on your computer, it’s going to be a gigabit ethernet network interface.

Even if you look at computers that are over 20 years old, the only thing that stands out on their spec sheets as still being current is gigabit ethernet.

However, gigabit ethernet (1GBE) sometimes is just not enough these days. Gigabit residential internet access is becoming more and more common, and while most of those consumers are using WiFi and are unlikely to ever get the gigabit performance they were likely expecting. A wired connection could easily max out the internet and LAN link.

10 Gigabit ethernet (10GBE) is creeping into the consumer market slowly, notably apple products have been shipping with 10 gigabit ethernet for a while now but it’s still rare to find people with 10 gigabit ethernet switches.

Most of the reason I suspect the consumer ethernet speeds have not improved in the last 20 years is that for the most part, it’s already fast enough. However things like storage have sped up remarkably now with not only faster disks (the average SATA hard disk will run at around 2.5 gigabits) but flash storage has given us single drive speeds of 6 gigabits, more if you look at NVMe (mine can read at nearly 20 gigabits). But when you look at what we can send to other computers (think remote storage like NAS’es) then we can quickly get limited by 1GBE.

Servers are a different story, while there are still a considerable number of servers still running on 1GBE, 10GBE and 25GBE (or faster!) are used for anything bandwidth intensive, since the network can quickly become the major bottleneck when faced with large compute power and storage capacity.

Enter Link Aggregation

However single 10GBE and 25GBE links are not always what you want. What if you want more bandwidth? What if the switch you are attached to needs a software upgrade or crashes? For this reason, a lot of the NICs for servers have two physical ports that are connected to the same switch (for a bandwidth increase) or different switches (for redundancy/failover).

But how do systems make use of multiple ethernet links? They program their switches and the servers to use Link aggregation (LAG). Link aggregation means that both links share a single IP and MAC address, and requires the switch and system (at least in the commonly deployed 802.3ad standard) to change their behaviour. Link aggregation goes by different names depending on the vendors you are configuring it on, names include; Aggregate Interfaces, NIC Teaming, Port Channels, and Bond’s. The last one causing amusing blog titles from time to time (A)

However, Link aggregation does not directly cause performance increases for single connections. This is because the OS and network layer typically directs a connection down a single ethernet link at a time. Since TCP and other protocols that do congestion control could become confused when they are presented with inconsistent performance feedback (as different links have different capacities and latencies). While a 2x10GBE may mean that you have 20 gigabits of bandwidth available to you, a single TCP connection will only run at 10 gigabits due to the connection directing/hashing logic.

Need for (single connection) speed

LTO 6 drive

In a previous post I talked about LTO Tape backups and how that drives themselves could read/write than a standard gigabit ethernet link, and that 10 gigabit networking is recommended if you are streaming data to a tape over the network to avoid issues.

Sadly, in my case the machine that held the SAS card required for running the tape drive consumed the last PCIe slot that could have held a 10GBE network card. Even though the systems feeding the PC hosting the tape drive were on 10GBE. This meant that my tape backups were far slower (and stalling on mbuffer) than was needed.

But what I did have was USB 3.0 ports, and where there are USB 3 ports there is the option to use USB gigabit ethernet dongles.

The issue comes however in that while I could have set the motherboard and the USB NIC into a LAG, that would not have improved the speed of my single TCP connection feeding the tape drive (via mbuffer).

I really needed a way to combine the throughput of both NICs into one 2gbit/s stream.

Going multipath with MPTCP

MPTCP Wireshark

Multipath TCP (MPTCP) or RFC8684 is an extension that allows a single TCP socket to span across multiple IP addresses and network interfaces.

This extension is currently used sparsely, with the only two commonly deployed uses being the OpenMTCPRouter Project and Apple’s Siri

OpenMTCPRouter uses MPTCP to proxy/tunnel connections for better throughput, allowing you to chain multiple residential connections into one faster link to a proxy server, while Siri uses it to handle rapid failover between WiFi and Cellular to ensure the best experience when using the voice assistant.

MPTCP was merged into the Linux kernel in 5.6, however I do not know of any mainstream distributions that have it present and enabled. Ubuntu 20.10 ships with MPTCP on it’s 5.13 kernel, while Debian Bullseye uses 5.10 but has MPTCP disabled.

$ cat /boot/config-5.10.0-11-amd64 | grep MPTC
# CONFIG_MPTCP is not set

Outside of having MPTCP support available in your OS. I have found the MPTCP usability story… Bad? It feels really bad to say this against something but my own research into MPTCP has been maddening.

This post was written on Feb 24th 2022, The situation may have changed by the time you have read this

Since MPTCP is now shipping in mainline Linux, the actual project website itself appears to have not kept up, the docs on the site sent me down wild goose chases only to find that the things written down are not supported anymore, or maybe I just have not found any way to do the things they have documented. I may just honestly be stupid with this, but I found the best living documentation to be the kernel tests for MPTCP since they by definition have to reflect the current API for MPTCP.

I think in general a lot of the pain that comes with this is that MPTCP is designed to automatically detect and begin multipathing traffic in a way where the user space has its hands off the details of the connection.

Because of this, I could not find a way to tell the kernel from user space (ignoring netlink etc) about multiple endpoints for a host, for this reason I gave up my attempt to write a mptcp client tool.

DIY Multipath TCP

Not satisfied with MPTCP I figured that an entirely userspace version of this concept is possible, and in fact someone I was doing contract work with at the time had a library that appeared to be able to “bond” together multiple connections. Upon trying to get it working however I found that the library did not achieve speed improvements, and failover behaviour was unpredictable at best.

So I worked on making sure it could, overhauling the library and pushing fixes to it. Once it worked enough to the point where I was making decisions that could break the library. I made my own fork to contain the behaviour changes.

The multipath go library is actually quite an involved bit of machinery. Because while MPTCP has raw packet access to do things with retries and subflow sorting, multipath does not. It takes what is basically an array of net.Conn’s and teams them together for bandwidth and resilience. This means that anything that conforms to net.Conn in go and is an ordered reliable stream (Like for example, WebSockets, TLS connections, SCTP in some modes, etc) can be used in this library all combined!

Due to the usage of TCP-like sockets, the performance will never be as good as if you wrote this using UDP yourself but given that is the way the multipath library started, I was determined to keep it that way even after I forked it.

Now that I had the library working though, I still needed a tool that wrapped it for day to day use…

Introducing bondcat

bondcat mascot

To wrap this all together I made a new utility. bondcat.

Bondcat has a user interface inspired by ncat but accepts the ability to connect to a host on multiple IP address/port combos. This means that with some knowledge on address selection (see below) you can easily and in a cross platform way beat the limits of single gigabit ethernet speeds.

I mention gigabit ethernet in particular because as far as I’ve managed I’ve not attempted to optimise the multipath library in going faster than 10 gigabit/s. This is because at that stage there are likely better options for moving very large amounts of data over very fast LAG’d networks, for example GridFTP.

But since it acts like netcat/ncat then you can easily wrap connections over it, for example you could use OpenSSH’s ProxyCommand to obtain faster than gigabit SFTP/SCP transfers:

[11:13:24] ben@metropolis:~$ cat .ssh/config 
Host tapedrive
  ControlMaster no
  ProxyCommand bondcat 192.168.XXX.XXX:2222 192.168.XXX.XXX:2222

For the listening side, bondcat includes a -relay mode that accepts connections and forwards the data to another tcp endpoint. Meaning to make this ssh setup work we can point the relay mode to

$ bondcat -relay 127.0.0.1:22 -l -p 2222

If that is not your usage style, you can always just use it as a regular netcat to send stuff around:

netcat bondcat

Now there are some considerations to be made for bondcat, since MPTCP has a far better idea of your network stack that bondcat can ever do (after all, the library doing the magic just sees a bunch of connections, nothing more) you need to be careful on what addresses you select for your use case:

Use Case: Going over the internet or trying to escape the limits of LAG bundles

This use is simple, you just need to invoke it as you would a normal netcat (single address) and use the -multiplier flag for how many extra connections you want to start. This flag also works with other addresses if you want to mix the hosts IPv4 and IPv6 addresses.

Use Case: Faster LAN transfers

I find it’s easiest to target IPv6 addresses for LAN’s. But that is all assuming that the LAN you are on has Router Advertisements. Assuming it does then it’s the best option. If not then the local IPv4 address normally works fine.

However this will generally only work in the 10GBE->{n}x1GBE setup. Bondcat by default tries to connect with every IP address on the machine to aid automatic speed boosts. To help this “automagically” work it’s best to start the connections from the machine that has the most interfaces. This function can be disabled with -a or -no-auto-detect

Use Case: Backup link failover

Assuming the system you are on has two links, you will need to manually add a route for one of the endpoints to go over your backup link. Other than that, it should transparently work.

You can pick up a copy of bondcat on github: https://github.com/benjojo/bondcat

I found it very useful for streaming backups to my tape drive, and I’m sure the library itself (the forked version is in the same code repo) would find uses outside of bondcat. However I assume as MPTCP gets better and more supported (with any luck) the tool will slowly become obsolete.

If you want to stay up to date with the blog you can use the RSS feed or you can follow me on Twitter

Also, I’m currently looking for work from March onwards. If you like what I do or think that you could do with some of my bizarre areas of knowledge, please contact me over at workwith@benjojo.co.uk!

Until next time!

Hacking Ethernet out of Fibre Channel cards (2020)

Random Post:

Thoughts on GitHub streaking(2014)