Linux container bug could eat your server from the inside – patch now!

If you’re a fan of retro gaming, you’ve probably used an emulator, which is a software program that runs on computer hardware of one sort, and pretends to be a computer system of its own, possibly of a completely different sort.

That’s how your latest-model Mac, which has an Intel x64 CPU, can run original, unaltered software that was written for a computer such as the Apple ][ or the Nintendo GameBoy.

One advantage of emulators is that even though the running program thinks it’s running exactly as it would in real life, it isn’t – everything it does is controlled, instrumented, regimented and mitigated by the emulator software.

When your 1980s Sinclair Spectrum game accidentally corrupts system memory and crashes the whole system, it doesn’t really crash anything at the hardware level, and the crashed pseudo-computer doesn’t affect the operating system or the hardware on which the emulator is running.

Similarly, if you’re an anti-virus program “running” some sort of malicious code inside an emulator, you get to watch what it does every step of the way, and even to egg it on to show its hand…

…without letting it actually do anything dangerous in real life, because it’s not accessing the real operating system, the real computer hardware or the real network at all.

Another advantage of emulators is that you can run several copies of the emulator at the same time, thus turning one real computer into several pseudo-computers.

For example, if you want to analyse multiple new viruses at the same time, all without putting the real computer – the host system – in any danger, you can run each sample simultaneously in its own instance of the emulator.

One disadvantage of true emulators – those that simulate via software every aspect of the hardware that the emulated programs think they are running on – is the performance overhead of all the emulation.

If you’re pretending to be a 1990s-era GameBoy on a 2020-era Macbook Pro, the modern-day host system is so much faster and more capable than any hardware in the original device that the relative slowness of the emulator is irrelevant – in fact, it typically needs slowing down so it runs at a similar speed to the original.

But if you are trying to emulate a full installation of Linux, where the emulator is running the very same version of Linux on very same sort of hardware, true emulators are generally too slow for anything but specialised work such as security analysis and detailed bug testing.

In particular, if you want to run 16 instances of a Linux system – for example, to host websites for 16 different customers – on a single physical server, then running 16 copies of an emulator just isn’t going to cut it.

Enter virtualisation

These days, the most popular way to split a computer between multiple different customers is called virtualisation, which is a hardware trick that’s a bit like emulation, but in a way that gives each virtualised computer – called a guest – much closer access to the real hardware.

Most modern processors include special support for virtualisation, where the host computer remains in overall control, but the guest systems run real machine instructions directly in the real processor.

The host computer, the host operating system and the virtualisation software are responsible for keeping the various virtual computers, known as VMs – short for virtual machines – from interfering with each other.

Without proper guest-to-guest separation, cloud-based virtual computing would be a recklessly dangerous proposition.

For all you know, the VM running your company’s web server could inadvertently end up running directly alongside a competitor’s VM on the same physical host computer.

If that were to happen, you’d want to be sure (or as sure as you could be) that there was no way for the other guys to influence, or even to peek at, what your own customers were up to.

Pure-play virtualisation, where each VM pretends it’s a full-blown computer with its own processor, memory and and other peripherals, usually involves loading a fresh operating system into each VM guest running on a host computer.

For example, you might have a host server running macOS, hosting eight VMS, one running a guest copy of macOS, three running Windows, and four running Linux.

All major operating systems can run as guests on, or act as hosts for, each other. The only spanner in the works is that Apple’s licensing prohibits the use of macOS guests on anything but macOS hosts, no matter that running so-called “hackintoshes” as guests on other systems is technically possible.

But even this sort of virtualisation is expensive in performance terms, not least because each VM needs its own full-blown operating system setup, which in turn needs installing, managing and updating separately.

Enter containerisation

This is where containerisation, also known as lightweight virtualisation, comes in.

The host system provides not only the underlying physical hardware, but also the operating system, files and processess, with each guest VM (dubbed, in this scenario, a container) running its own, isolated application.

Popular modern containerisation products include Docker, Kubernetes and LXC, short for Linux Containers.

Many of these solutions rely on a common component known very succinctly as runc, short for run container.

Obviously, a lot of the security in and between containerised applications depends on runc keeping the various containers apart.

Container segregation not only stops one container messing with another, but also stops a rogue program bursting loose from its guest status and messing with the host itself.

What if the container bursts open?

Unfortunately, a serious security flaw dubbed CVE-2019-5736 was found in runc.

This bug means that a program run with root privileges inside a guest container can make changes with root privilege outside that container.

Loosely put, a rogue guest could get sysadmin-level control on the host.

This control could allow the rogue to interfere with other guests, steal data from the host, modify the host, start new guests at will, map out the nearby network, scramble files, unscramble files…

…you name it, a crook could do it.

Precise details of the bug are being witheld for a further six days to give everyone time to patch, but the problem seems to stem from the fact that Linux presents the memory space of the current process as if it were a file called /proc/self/exe.

Thanks to CVE-2019-5736, accessing the memory image of the runc program that’s in charge of your guest app seems to give you a way to mess with running code in the host system itself.

In other words, by modifying your own process in some way, you can cause side-effects outside your container.

And if you can make those unauthorised changes as root, you’ve effectively just made yourself into a sysadmin with a root-level login on the host sever.

For what it’s worth, the runc patch that’s available includes the following new program code, intended to stop containers from messing indirectly with the host system’s running copy of runc, something like this:

static int is_self_cloned(void) {...}
static int parse_xargs(...) {...}
static int fetchve(...) {...}
static int clone_binary(...) {...}
static int ensure-cloned_binary(...) {...}


void nsexec(void) {
    . . .
    * We need to re-exec if we are not in a cloned binary. This is necessary
    * to ensure that containers won't be able to access the host binary
    * through /proc/self/exe. See CVE-2019-5736.
   if (ensure_cloned_binary() < 0)
      bail("could not ensure we are a cloned binary");
   . . .

What to do?

Any containerisation product that uses runc is probably vulnerable – if you have a version numbered runc 1.0-rc6 or earlier, you need to take action.

Docker users should check the Docker release notes for version 18.09.2, which documents and patches this bug.

Kubernetes users should consult the Kubernetes blog article entitled Runc and CVE-2019-5736, which explains both how to patch and how to work around the bug with hardened security settings.

As the Kubernetes team points out, this flaw is only exploitable if you allow remote users to fire up containers with apps running as root. (You need root inside your container in order to acquire root outside it – this bug doesn’t allow you to elevate your privilege and then escape.)

You typically don’t want to do that anyway – less is more when it comes to security – and the Kubernetes crew is offering a handy configuration tip on how to ensure your guests don’t start out with more privilege than they need.

Of course, you may be a container user without even realising it, if you have some of your software runnning in a cloud service.

If in doubt, ask your provider how their service works, whether they’re affected by this bug, and if so whether they’re patched.

Quickly summarised:

  • Patch runc if you’re using it yourself.
  • Stop guest containers running as root if you can.
  • Ask your provider if they’re using runc on your behalf.