Anatomy of a file format problem – yet another code verification bypass in Android

Four months ago, the Android platform was stirred, though fortunately not too badly shaken in the end, by a pair of code verification holes.

Simply put, you could add malware to a legitimate app – one from the Play Store, if you liked, complete with icons, branding and reputation – in such a way that Android’s digital signature verification would consider it unaltered.

From the helpless user’s point of view, Google wouldn’t just fail to warn about the app possibly being dodgy, it would actively assert that it was the a validated and unaltered item from the original, legitimate vendor.

Google, developers, users: everyone lost out except the crooks.

Sloppy coding

Both of those earlier holes came about as a consequence of sloppy coding to do with Android Package (APK) files, the format used for delivering apps.

It turns out that there was a third bug of a similar sort, found at around the same time as the others, but only publicly patched this month, when the open source code from Android 4.4, better known as Kit Kat, was released.

→ Android and iOS low level code maestro Jay Freeman (better known as @saurik), amongst others, found this bug mid-2013 but forbore from writing it up until the patch was officially out.

It’s the sort of mistake that a company with the self-professed security smarts of Google really ought not to have made, not least in one of Android’s much-vaunted security linchpins, namely the compulsory validation of digital signatures when installing new software.

So this is a story worth telling, because it is a powerful reminder of how backward compatibility, multi-language programming, and the re-use of code libraries not designed with security in mind can get in the way of correctness.

Here’s what went wrong.

The how and why of APK files

For reasons I don’t know, but presumably because the format was well-established, Google settled on ZIP files as the storage containers for Android Packages (APKs).

APKs are just ZIPs containing a special subdirectory, called META-INF, that holds signed checksums for all the other files in the package.

Unfortunately, the ZIP file format came out in the late 1980s, aimed at convenience and efficiency, not security.

→ The reason that APKs need a special META-INF directory for cryptographic metadata is that ZIP files were designed to support only the most basic non-cryptographic validation, such as checking that a shareware download wasn’t corrupted by your 0.0012 Mbit/sec modem. Verifying the identity of the original creator was not a consideration.

A ZIP file, also known as an archive, is effectively a special-purpose filing system.

ZIPs can store multiple files in a directory structure, with each file and directory individually named, timestamped, compressed and, optionally (albeit insecurely) encrypted.

Today, of course, despite the giant size of many software distributions, removable storage devices are usually larger than the files you’re downloading – OS X Mavericks, at a whopping 5.3GB, for example, fits easily onto all but the very cheapest and smallest USB sticks on the market.

But that wasn’t true in the 1980s and 1990s, when downloads often ran to several megabytes, while a regular floppy diskette could store just 720KB.

ZIP files, therefore, were not only compressed to save space, but also laid out so that they could easily be split across multiple diskettes.

Better yet, they could be restored – file by file, if necessary – without requiring you to insert every diskette one-by-one to work out the directory structure before starting.

As a result, the ZIP format is deliberately redundant, and its internal directory structure is recorded twice.

The file and directory names are stored first as a series of individual local headers interleaved with the data for each file, and then stored again in the curiously-named central directory tacked on the end.

By keeping the central directory to the end, the ZIP program never needs to ask you to reinsert an earlier floppy disk to rewrite it when building the archive.

And by interleaving file headers throughout, ZIP files can still be recovered, at least in part, even if the last floppy in the set is lost or damaged.

The downside of redundancy

This sort of redundancy is handy in an emergency, but can be dangerously distracting during routine operations.

There’s a famous nautical adage (or if there isn’t, there should be) that says, “When you set to sea, take three chronometers. If one of them breaks, throw one of the remaining two overboard.”

I made that up, but the reasoning is sound: with three clocks, you still have a majority vote if one goes wrong.

But if you have two, and they read differently, what are you going to do to resolve the dilemma?

You face a similar problem with ZIP file metadata.

What Google really ought to do – or ought to have done when the first two APK holes surfaced – is to break this dilemma permanently by treating APK files in one or both of these ways:

  • Pick one of the two file metadata systems in the ZIP format, and use it exclusively when decompressing APKs, deliberately removing or avoiding any library code that might read and rely on the alternative metadata and thus harm security.
  • As part of validation, before trusting any file objects inside an APK, check that the two directory structures are identical, giving the same filenames, timestamps, sizes, and so forth. If not, assume corruption or malevolence.

The latest flaw

The ZIP file ambiguity exploit patched in Android 4.4 abuses the filename length field in a ZIP file’s metadata.

This tells you how many bytes to skip forward in the local file header to get past the filename in the header itself to the actual file data, and how many bytes to skip forward in the central directory to get past the filename to the next directory entry. (There is no file data in the central directory, only file metadata.)

You can probably guess what’s coming next.

The Java code in Android 4.3 and earlier that extracts the file data to verify it uses the filename length from the central directory.

But the C code that extracts the file to install and execute it uses the filename length in the local header.

So you can deliberately craft a a file that is laid out as shown above, with the local header filename length deliberately set so large that it points past both the filename and the original file data.

This presents one file to the verifier, and a different file to the operating system loader.

Very simply put: the loader can be fed malware but the verifier will never see it.

How can that work?

At this point you may be wondering how this subterfuge can possibly work, unless the dodgy file is the last in the archive and Android doesn’t check for a neat conclusion to its file-by-file processing of the APK.

After all, in the above diagram, surely the C code will see an absurd and deeply suspicious filename?

The filename length is so big that the C code will see the real filename with the raw binary content of the original file (shaded green) tacked on the end, and that won’t match anything in the META-INF security database.

And surely the Java code that does the verification will get lost when moving forward in the APK?

The data that follows the original file data is supposed to be the next local file header, recognisable by its PK\x3\x4 “magic” string, but in our example, the file data is followed by yet more data – the imposter file (shaded pink).

Saurik explains this very simply in his coverage of the bug: the C code ignores the filename from the local header; and the Java code uses file offsets from the central directory to navigate through the archive.

So neither of the giveway “something’s gone wrong” conditions described above arises.

→ The central directory includes a file offset for each local header, so that once the Java code has finished verifying a file, it can jump directly to the next one, thus avoiding the local header data that would cause it to skip forward incorrectly. The imposter data, squeezed between the legitimate file and the next local header, is simply ignored.

What to do

Google doesn’t seem to have gone for a holistic fix, such as either or both of those listed above.

But the Android maintainers have made a small but apparently effective adjustment by altering the Java-based validation code so that it follows a similar path through the data to that used by the loader.

By forcing the Java code to rely on the local header data to find its way to the file data, the verifier will check what the loader is going to run, not merely what an attacker wants it to see.

I still think that disallowing APK files altogether if they contain discrepancies between the two streams of file metadata would be a more solid and satisfying approach, but we shall have to take what Google has given us.

And given the comment in the old code noting that the “extra field” data could vary from the central directory (the cause of the previous verification hole), you’d have thought that the programmer might have thought ahead and applied the same logic proactively to the filename length.

But the laconic variable name localExtraLenOrWhatever in the old code suggests that the programmer didn’t have security on his mind when he wrote that snippet, so the proactive fix didn’t happen, thus retaining the filename length vulnerability.

So, until your device gets upgraded to Android 4.4, you’re at risk.

We offer these three tips:

  • Stick to the Google Play Store, where we hope that Google has taken a holistic approach and is rejecting submissions with fishy-looking metadata in their APK files.
  • Use an Android anti-virus (yes, Sophos just happens to have a good one, and it’s free from the Play Store) that can scan newly-installed packages automatically before you run them.
  • If you’re a programmer, don’t follow Google’s lead here – code with security on your mind all the time.

Neither the Play Store nor your favourite anti-virus can guarantee to keep all unwanted apps off your device, but together they will come close.