Oct 12 2019

You cannot cURL under pressure

cURL. The wonderful HTTP plumbing tool that powers both a lot of command line debugging and bash scripts, but also exists as a strong foundation in our applications in the form of libcurl.

The scale of adoption of libcurl / curl is actually remarkable. It’s mostly kept up to date with new protocols too, ensuring that it’s relevance is maintained. Putting it bluntly, there is not really a better tool in the unix(-like) land for making HTTP/HTTPS requests than cURL. (For windows I guess it’s 50/50 with WinHTTP and for OSX it’s URLSession.)

But the scope creep of cURL is also something to behold, the program can do tonns of stuff! Just look at the home page!

DICT, FILE, FTP, FTPS, Gopher, HTTP, HTTPS, IMAP,
IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP,
SFTP, SMB, SMBS, SMTP, SMTPS, Telnet and TFTP.

curl supports SSL certificates, HTTP POST, HTTP PUT,
FTP uploading, HTTP form based upload, proxies, HTTP/2,
cookies, user+password authentication
(Basic, Plain, Digest, CRAM-MD5, NTLM, Negotiate and Kerberos),
file transfer resume, proxy tunneling and more.

Holy shit! That’s a lot of RFC’s that have been implemented. The help pages of a fully loaded cURL install are, understandably, very long:

# curl --help | wc -l
221

# curl --manual | wc -l
4596

With cURL having this many features (with the general mass of them being totally unknown to me, let alone how you use them) got me thinking… What if you could do a game show style challenge for them?

This brought back memories of the game that the now defunct UsVsTh3m made called “You can’t javascript under pressure”.

a gif of the game

Sadly UsVsTh3m’s content is gone, but the Internet Archive holds a functioning copy of it:

http://games.usvsth3m.com/javascript-under-pressure/

I thought… Could you could turn the curl manual page into a game?

Well no need to just think about it anymore. I present to you:

you cannot curl under pressure logo

Interested on how this works?

Below is a technical write up of the infrastructure behind this game, if you are stopping here then feel free to keep track of my future shenanigans using my blog’s RSS or following me on Twitter.

The infrastructure

The game itself is actually quite basic, it boils down to this flow diagram:

basic flow diagram

To prevent users from tripping over themselves (or eachother) and making their environment dirty, every time a user completes a challenge they are switched to a new VM:

showing the almost invisible switch over in VMs

The way this works internally is a single vm-router takes in the main websocket connection, and holds the game state with the client. It then talks to a number of copies of a test-supervisor.

test-supervisor holds a number of booted VM’s in order to quickly serve demand. When a challenge is over, the VM that was being used for that user is killed along with the filesystem and a new VM is booted in its place to prevent cross contamination.

top down view of the web game

Controlling Load

I’ve had a nasty habit in the past of making blog posts that had interactive elements that the feature backends can’t keep up (even though the blog’s 100% cache hit rate keeps going).

The most recent case of this is when I wrote about a QEMU vulnerability in my own site and the incoming visitors rushing to try out that site had me asking friends for compute capacity in a hurry to keep up with demand.

Since this game requires cycling over a lot of VMs, the ‘slashdot effect’ could really be lethal for this post. So I’ve spent time this round to help things run smoothly.

One of the big bottlenecks is booting VMs. I spent a considerable amount of time tweaking both the qemu parameters and linux kernel config to try and cut down on any drivers or anything else slowing down the state of getting a system booted and into a shell.

I ended up taking dmesg dumps and moving them into spreadsheets to highlight stages that were taking a long time.

the flawless dmesg spreadsheet

This is not a fool proof method as kernel does do some stuff async on boot. However for my case it ended up taking a 5 second boot to shell down to a 2.3 second boot to shell.

The changes I ended up doing was:

Switch from e1000 to virtionet, since the intel driver takes up to 1 second to start.
Switch from IDE/SATA to virtio block devices, and compile the SATA driver out, since it takes sometimes upwards of 600ms, even if there are no SATA devices.
Pass through the hosts /dev/urandom to the VirtioRNG.
Remove HID drivers that didn’t relate to serial.

I found Feng Tang’s talk slides from Linux Plumbers Conference enlightening on getting the boot time down.

The 2.3 second boot time was without KVM acceleration. I didn’t want to enable KVM since I was nervous about giving access to that to possibly random users on the internet. With KVM acceleration boot to a usable shell was roughly 600 milliseconds.

Giving I had built all of this up with buildroot I ended up with a nice small rootfs and kernel image at the end of it.

root@yccup:/home/proxytest# ls -alh rootfs.ext2 bzImage 
-rw-r--r-- 1 root root 5.9M Oct  5 22:30 bzImage
-rw-r--r-- 1 root root  20M Oct  5 22:30 rootfs.ext2

Catching connections

Since these small VMs fundamentally need network access to work correctly, some creativity was needed to ensure that things could remain both secure (so that other players can’t sabotage other players) and easy to manage (I need to be able to attribute connections to individual VMs)

While most of the solution for VMs is TUN/TAP devices attached to a bridge, that would require giving the qemu processes root for a short amount of time, something I didn’t really want to do in the case that the qemu layer was breached.

Instead I stuck on using the User Networking (SLIRP) in qemu and using the REDIRECT target to redirect all outgoing connections from qemu back into the test-supervisor for handling

connection flow with iptables redirect

The iptables rules to do this are pretty simple, and I’ve used this kind of setup before on my Giving every Tor Hidden Service an IPv6 address post.

iptables -t nat -N proxytest
iptables -t nat -A proxytest -d 192.0.2.0/24 -p tcp --dport 80 -j REDIRECT --to-ports 9999
iptables -t nat -A proxytest -d 192.0.2.0/24 -p tcp --dport 443 -j REDIRECT --to-ports 9998
iptables -t nat -A proxytest -d 192.0.2.0/24 -p tcp --dport 25 -j REDIRECT --to-ports 9996
iptables -t nat -A proxytest -d 192.0.2.0/24 -p tcp --dport 21 -j REDIRECT --to-ports 9995
iptables -t nat -A proxytest -d 192.0.2.0/24 -p tcp --dport 1337 -j REDIRECT --to-ports 9993
iptables -t nat -A proxytest -p tcp -j REDIRECT --to-ports 9997
iptables -t nat -A proxytest -p udp -j REDIRECT --to-ports 9997 

iptables -t nat -A OUTPUT -p tcp -m owner --uid-owner proxytest -j proxytest
iptables -t nat -A OUTPUT -p udp -m owner --uid-owner proxytest -j proxytest

You can then use the /proc/<pid>/fd to track down the connection owners to a single VM process. Using that we can check what challenge they are suppose to be doing and route them over to the correct testing logic.

This logic can get a little ugly with FTP, a protocol that deals with multiple connections (one control socket and another data socket).

Auto scaling

Since this system would easily get overwhelmed, the vm-router has the ability to spawn more systems in the case of dire need. This works with a little homemade PDU controlled + DHCP/PXE Boot cluster I made a while back, but then had no use for:

the html tables UI of my mini cloud

Small brick sized PCs are started and shut down automatically to meet demand should there be a traffic surge.

Like most of my blog posts, you can find the code that powers the backend on my github:

https://github.com/benjojo/you-cant-curl-under-pressure

If you enjoyed this, you may enjoy the rest of the blog, You can keep up to date with both the serious and non-serious stuff I do by using the RSS feed or by following me on twitter

Until next time!

Calling the world cup goals 5 seconds before they happen (2018)

Random Post:

Propagation slow? Sound the alarms! (2015)