You’ve probably heard of bots.
In this case, we don’t mean malware bots (robot networks, also known as zombies) that infect your computer so that crooks can send it commands from afar.
We mean computer bots such as chatbots, which pretend to be human particpants in online conversations; or gamebots, which try to mimic human participation in online games; or web indexing tools, like the Googlebot, that act like an insatiable human internaut browsing the web by clicking through to everything in sight.
We’ve already written about Rose, a prize-winning chatbot that didn’t convince everybody; about Talking Angela, a computer-game talking catbot that produced the longest-ever thread of comments ever seen on Naked Security; and about Dreamwriter, an articlebot from China that churns out online news pieces.
Now, an MIT student and his supervisor have recently published a paper that approaches robot creativity from another angle: rather than churning out new stuff, their coderbot, called Prophet, tries to fix bugs in already-published code.
The paper is rather technical, and at 15 densely-packed pages, you probably won’t get through it during a regular-sized tea break.
But this, in a nutshell, is what Prophet does:
Prophet works with universal features that abstract away shallow syntactic details that tie code to specific applications while leaving the core correctness properties intact at the semantic level. These universal features are critical for enabling Prophet to learn a correctness model from patches for one set of applications, then successfully apply the model to new, previously unseen applications.
We’ve convinced ourselves we understand what that means, and that we have absorbed enough of the paper, thanks to an extended tea break, to summarise Prophet’s approach.
The important feature of Prophet is that, like Google’s language translation (if we have figured that out sufficiently well), it doesn’t work by trying to understand exactly how a piece of buggy code works in order to figure out what’s wrong with it.
Instead, it uses a probability-based approach, where it applies a number of different possible changes to the buggy code, retesting it every time, and figuring out which changes produce the best improvements in test results.
Of course, the number of possible changes you can make even to a tiny fragment of code – such as add an error-check up front, add one afterwards, do a loop one time fewer, do a loop one time more, remove a line of code here, add one there – is enormous, and if you combine the possible changes in all possible ways, you get an exponential explosion in the number of tests you have to do.
So, Prophet’s smarts involve choosing carefully what changes to try, based on pre-processing a corpus of existing code changes (known as diffs, short for differences) that are known to have fixed similar bugs before.
But once it’s proposed a series of changes, Prophet needs a way of evaluating how well the changes worked.
As the authors point out, that part of the system relies on an existing suite of software tests that can already detect the bug you are trying to fix.
In other words, you already need a software test that will fail on the buggy code, so you have some way of measuring whether your proposed changes had any effect on triggering the bug.
WHEN PASS MEANS FAIL
Of course (the authors explain by means of an example based on a bug in PHP), the fact that a change causes a product to pass its tests doesn’t mean that the change is a fix for the bug.
More importantly, the change could reliably stop the current bug from triggering, but at the cost of introducing a new bug that no current test can detect.
You’d end up in an even worse situation: a newly-buggy program that passes all its tests.
So, perhaps what we need now is a testbot that can automatically generate an effective and comprehensive test suite for any given program…
…and then we can improve the quality of the testbot by using Prophet to get rid of its bugs.
On a serious note, we welcome this sort of research.
The crooks already have a variety of tricks and tools that help them find bugs, such as fuzzing, where you deliberately introduce corrupted input to see if there are any patterns of software misbehaviour that let you control the resulting corruptions in output.
So a tool that can automate our response to automatically-found vulnerabilities can help to close the “exploit gap.”
If there’s one thing that worries us about Prophet, however, it’s this.
Code that has a poorly-written and ineffective test suite is, we assume, more likely to have bugs that need fixing.
Yet code with a poor test suite is correspondingly more likely to provide weak pass/fail feedback to probabilistic tools like Prophet, thus making it more likely that bug fixes will result in new bugs that need fixing.
What do you think? Do you welcome our new bug-fixing overlords?