Machine learning self defence: how to not shoot yourself in the foot

Machine learning

Thanks to Dr. Richard Harang and Madeline Schiappa of SophosLabs for their work on this article.

In case you hadn’t noticed, machine learning is big, really big. I don’t just mean “blockchain on the crest of the Hype Curve” big, I mean actually, you know, useful.

Like the much talked about blockchain, machine learning is being touted as a technology that could change everything. Unlike the blockchain, it probably will.

It’s already proven itself to be a disruptive technology in tasks as diverse as spotting bank fraud, driving cars, understanding human speech and identifying malware.

As a result, organisations are planning to spend tens of billions of dollars on it in the next few years, which means that in the near future lots of people will be building and using machine learning-based solutions for the very first time.

Hopefully they’ll do it securely but, unfortunately, computing’s gold rushes have a terrible record when it comes to cybersecurity. New technologies often usher in new ways to get compromised by hackers (or even old ways we thought we’d seen the back of  – yes, I’m looking at you Internet of Things).

We wondered: what kind of threats will organisations’ that plan to roll their own machine learning solutions need to be aware of?

Naked Security sat down with SophosLabs’ data scientists Dr. Richard Harang and Madeline Schiappa to learn about the threats that machine learning solutions face and how they can be protected.

We’ll look at how hackers might disrupt or corrupt your machine learning in later articles, but we start with arguably the biggest threat you face: yourself.

Machine learning is new, subtle and complex, and the potential for self-inflicted wounds is high.

Before we get into why, let’s recap what machine learning actually is.

Machine Learning

Traditional software is, essentially, a set of rules that governs how a computer should behave in a particular context. It’s very good at dealing with well structured data and tends to produce software that’s good at the things we’re bad at: executing highly complex sets of instructions within strict parameters, perfectly, over-and-over, at tremendous speed.

Machine learning is a branch of AI (Artificial Intelligence) that uses software models that are taught by example and figure out their own rules. According to Arthur Lee Samuel, the computing pioneer who invented the term, machine learning is:

[a] field of study that gives computers the ability to learn without being explicitly programmed.

Where conventional software is characterised by rigidity, transparency and provably correct behaviour, machine learning is fuzzy, flexible, opaque and only likely (rather than certain) to behave a particular way.

Machine learning models can generalise in a way that software based on programmed rules can’t, giving us an entirely new way to approach problem solving with computers.

Garbage in, garbage out

Every computer programmer, going all the way back to the very first one – Charles Babbage – knows that no amount of programming wizardry can get the right answer from the wrong information.

Babbage, who invented the first mechanical computer way back in the nineteenth century, might not have used the epithet GIGO (Garbage In, Garbage Out) but he certainly understood it:

On two occasions I have been asked, “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?” … I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

GIGO presents a particularly acute challenge for complex machine learning systems because the garbage can be very difficult to detect, which can lead to subtle errors and unintended consequences.

Get your training data or testing a little bit wrong and you’ve accidentally trained your model to recognise incidental, correlated phenomena rather than the things that actually matter.

Or, to put it another way, you’ve unwittingly made your face recognition system racist.


Modern machine learning models work so well because they’re extremely good at finding subtle and complex correlations in your training data. This allows them to learn ways of recognising things – whether it’s faces, patterns of fraud or spam – that human programmers can’t match.

But this powerful capability can backfire in unexpected ways. If your training data contains a correlation that’s spurious or doesn’t exist in the wild, it can easily learn the wrong lessons.

And Big Data is full of spurious correlations (increasing global average temperatures are correlated with the decline in pirate numbers for example, as any good pastafarian can attest).

Let’s imagine we’re training a machine learning model to recognise spam email. Our training data is a database of emails that humans have diligently labelled as either ‘ham’, the emails we like, or ‘spam’, the emails we don’t. Our machine learning system will read the emails and figure out what separates the fresh pork from the canned meat.

Now let’s imagine that our training data contains a plausible but spurious correlation: by chance, every email with an image attachment that came from an IP address ending in 12 has ended up in the spam pile, just because.

There are many other spam emails in our training data that don’t have those two properties, it just so happens that every email that does is in the spam pile. And the humans making the training data didn’t mark those particular emails as spam because of the IP address and the image attachment, it’s just a coincidence.

According to the kinds of checks we might use on a traditional computer program, we haven’t introduced any garbage into our system. The emails in our training data aren’t garbage: they’re well formatted, standards compliant emails. Our labelling isn’t garbage either: the hams are ham and the spams are spam.

Nonetheless we have introduced some garbage.

If our model is sufficiently complex it may infer that the presence of a sender’s IP address ending 12 in an email with an image attachment is a sure fire indicator of spam when, in fact, outside of our training data it isn’t.

When it’s deployed in the real world our anti-spam engine will block a lot of perfectly good emails from people whose IP address ends with a 12.

We broke our example with bad data but we could just as easily have done it by accidentally labelling some of our spam as ham and some of our ham as spam.

In our example the flaw we introduced was quite subtle, but machine learning systems aren’t limited to making subtle errors.

Machine learning algorithms are non-deterministic and we can only say what a model is likely to do in the real world, not what it will do. Deciding if our software is working correctly means first deciding what ‘correctly’ will look like and then testing to see if that’s what our software does.

The less thorough our testing is, the bigger the flaws that can escape into the real world.

Sophos data scientist Hillary Sanders presented at the 2017 BlackHat conference, showing how even slight mismatches between the kind of data a model is trained on and the kind of data that a model sees in the wild can lead to a significant drop in performance.

Even with great labels and a lot of data, if the data we use to train our deep learning models doesn’t mimic the data it will eventually be tested on in the wild, our models are likely to miss out on a lot.

This kind of mismatch will likely lead your model to learn a core of rules that work well in most cases, as well as a number of less important rules that don’t apply to the real world.

And that’s a recipe for Heisenbugs (or shooting yourself in the foot and being unable to find the gun).

So, In a nutshell: A model doesn’t know what you don’t tell it, and you haven’t always told it what you think you’ve told it.

What to do?

There isn’t an easy solution to the GIGO problem but, then again, when are silver bullets not in short supply? We suggest you start here:

  • Use good data! It’s obvious, yes, but no less true for that. Be careful to only feed your model with lots of well labelled data from sources that represent what it will see in the real world.
  • Get used to cleaning because it’s a big, unglamorous and important job. When you discover a self-inflicted wound you’ll need to change the way you clean and label your data to account for your mistake. Rinse and repeat.
  • Try not to overtrain your model or you’ll overfit it. Remember, you aren’t trying to recognise your training data with perfect clarity, you’re trying to recognise things that share similarities with your training data.
  • Expect “label noise” and use training methods that reduce its negative effects. Says Schiappa: “This usually means using a robust loss function, to represent the cost of inaccuracy in the model during training”. Your training process should work to minimise this cost.
  • Pay attention to false positives and false negatives during testing, and let your model help you clean your data. Sometimes your labels were wrong, and the model was right.
  • Consider deep learning as your machine learning method of choice. Research indicates that it’s better at dealing with label noise than shallower learning methods (which is one of the reasons that Sophos uses deep learning to build its machine learning models).