Stick and stones may break my bones but words will pass through me undetected?

Image (2) sticksandstones2.png for post 24773

The big scare story of this week is based on recent research demonstrating a technique to write shellcode that resembles English text.

Some bright spark naively suggested that this will make the shellcode almost impossible for anti-virus scanners to detect. Yeah right, never heard that one before! Predictably, however, scaremongers are now jumping on the bandwagon and proclaiming the defeat of anti-virus.


The following paragraph apparently contains the start of some shell code which can be used to bootstrap arbitrary code execution on your PC:

"There is a major center of economic activity, such as Star Trek, including The Ed Sullivan Show. The former Soviet Union. International organization participation."

So has your PC just been infected by reading this? Of course not.

No text in the world is going to execute malicious behaviour on your PC while it is just being interpreted as plain text. Even though bytes within the above paragraph also represent a legitimate sequence of CPU instructions they have to be placed somewhere where they will actually be executed as CPU instructions before they can actually do anything.

So, if the above words appear in a plain text file, or within this blog article, SAV is probably not going to take much notice of them. However, if they appear within an executable section of a windows program, or in a document crafted to generate a buffer overflow, our AV engine can decide to investigate further.

Will that investigation be harder than any of the other challenges that have confronted us in the technology war between malware authors and anti-virus researchers? Not really. Why does anyone think code hidden in benign looking text is harder to find than code hidden within benign looking code? Polymorphic mid-infecting viruses were a much greater challenge to the AV industry than this will ever be.

Indeed, given that it takes a few hours worth of moderate computing power to generate a piece of “English Shellcode”, we are not going to see new variants pumped out every second like we see with some poylmorphic code generators. If “English Shellcode” ever appears in the wild only a few specific variations of the bootstrap shellcode are likely to be used. They could quite effectively be detected by the oldest and simplest of techniques: specific pattern matching.

However, everyone should know by now that modern anti-virus does not rely just on pattern matching. Sophos’ Behavioral Genotype technology can be used very effectively here. As soon as “English Shellcode” is placed into a context where it might actually get executed as shellcode there will be suspicious behavioral clues. For example, SAV uses sandboxed emulation to explore potential execution paths before allowing an executable to really run on the host computer. If the execution path suddenly jumps to what previously looked like English text that in itself is very suspicious.

It is no harder to detect shellcode within text than to detect it in other formats, such as picture and metafile data. The reason anti-virus engines do not usually pay much attention to scanning plain text is not because they are incapable of it, but because there is currently no need for it. We make design choices to optimize scanning speed. Those choices could easily be changed if necessary, but I do not know of any significant exploit in common editors for plain text files. Nor, as I said earlier, does this new technique make plain text scanning necessary.

If in depth textual analysis ever does become necessary, then the relevant skills are already right here in SophosLabs. For example, we do scan HTML and script texts thoroughly, while our anti-spam products do all sorts of textual analysis when they scan emails.

“English Shellcode” is being touted as a major proof of concept breakthrough, but in my opinion it is little more than a party trick. There are some contexts where it could necessitate extra scanning of textual data, so at worst this becomes one more thing to check for, one more thing that adds a few CPU cycles to the scan time. It is certainly not undetectable, and if we ever see real life malicious examples of textual shellcode, Sophos products will protect you from it.

Remember that the processor most likely to be exploited by words is not the desktop CPU but the human brain. Don’t let the scaremongers frighten you!

Image courtesy of