Are anti-virus testers measuring the right things?

Filed Under: Featured, SophosLabs

Image of clipboard courtesy of ShutterstockLast week saw the first Workshop on Anti-Malware Testing Research (WATeR), a conference bringing together security and testing experts from industry and academia to discuss testing related matters, held in Montréal, Canada.

Among the papers presented were several looking at the sort of things current tests of anti-malware solutions reveal, and some things they do not.

Several of the papers updated topics that were previously discussed at Virus Bulletin conferences and elsewhere.

There was an in-depth talk analysing just how many samples or "test cases" a test needs to include to provide a statistically significant picture of performance against the huge numbers of new threats appearing each day (the short answer - a lot), and what aspects of sample selection may bias results.

There was also a description of the methods used by the École Polytechnique de Montréal (who hosted the conference) in a "field trial" of anti-malware.

This mirrored techniques used in clinical trials, by handing out laptops to real-world users, letting them do what they wanted with them, then periodically checking the machines out to see what threats they'd been hit with, and what, if anything, got past the defences installed.

One of the more thought-provoking talks came from Florida Institute of Technology professor and AMTSO president Dr Richard Ford, who asked "Do we measure resilience?"

Ford differentiated between "robustness", defined as the ability of solutions to prevent malware from penetrating systems at all, which is covered by most anti-malware tests, and "resilience", by which he meant the ability of protection and protected systems to recover from attacks which do manage to get through the border controls and establish a foothold on the machine.

He argued that the resilience side of things is important to end users and sysadmins, but is rarely covered in much depth in public tests.

For the most part, the leading comparative and certification tests look mainly at detection or protection metrics. We measure how many threats a product can pick up with its scanners, or how many it can block with the various other layers of filters and monitors included in most products these days.

These would all be robustness measures.

Resilience might perhaps be covered by a removal or clean-up test - seeing how well a product can deal with an infected machine. Some tests include these, but they tend to be performed separately from the "robustness" tests, as it's hard to tell how well a product can clean something up if it doesn't let the machine get infected in the first place.

Ideally, Ford argues, a clean-up test would be run as part of a protection test - any threats which are not blocked initially should be allowed to run to see if they are blocked or removed later on.

If a threat can disable the security product and take complete control of the machine permanently, that's basically zero for resilience; however, if the threat can only run for a while before fresh updates allow the protection to recover and clean the infection up, that's a little better.

Of course, most threats are about more than simply staying on the machine - it's all about gathering up your data and sending it off to be abused by the bad guys. But how this is handled could also be considered a resilience measure.

If a machine gets infected with a keylogger, which is not initially spotted, some products might then detect it when it starts trying to read your bank account login details, or when it tries to send that information out to the internet.

clock-130In the case of the CryptoLocker threat currently grabbing the headlines, it might be that the malware is allowed to run, but blocked when it starts trying to make changes to files you've marked out as sensitive.

An analogy might be that robbers manage to break into a bank, but a security guard manages to pin them in the staff canteen until reinforcements arrive.

How well a product copes in these kinds of situations might well be very important, but it's rather tricky to measure.

It means first of all getting systems infected with malware, which means finding items which defeat the "robustness" layer, then leaving them infected, ideally with realistic everyday actions going on, until such a time as the product under test either does something about them, or gives up the ghost.

That's pretty labour-intensive work, and tricky to automate. There's also a need for caution, as running a machine infected with unknown malware risks creating unnecessary dangers to the outside world - the machine could start spewing out spam for example. So the tester needs to ensure the risks are kept as tightly controlled as possible.

Even if you do manage to do all that, there's then a further issue of rating the relative successes of different products.

Resilience is highly dependent on the setting - in some situations, it might be fine for a system to go down completely as long as it bounces back quickly, while in others it's OK for the recovery to take a long time if the initial outage is only minor.

So, a tough proposition for us testers to work on, but one that could have some useful outcomes. Testing should show where products are less than perfect; if the world requires resilience then we need to see if products are providing it, and encourage them to do so if not.

The meeting was rounded off by a talk suggesting that in certain circumstances, and with the proper caution, it might be considered appropriate to create new malware for testing purposes, which generated the expected controversy, and a panel debating what areas might be ripe for deeper analysis by academic researchers.

The panel's conclusions were that there is room for much more active collaboration between industry and academia, with the resulting cross-pollination of ideas and resources leading to good things for both sides, and indeed the world at large.

On the evidence so far, I'd be inclined to agree. Events like WATeR can shift our thinking in all kinds of interesting new directions.

Image of clipboard courtesy of Shutterstock.

, , ,

You might like

4 Responses to Are anti-virus testers measuring the right things?

  1. Tamas Feher · 697 days ago


    "If a threat can disable the security product and take complete control of the machine permanently,"

    That is more or less a given. Only incompetent malware writers fail to gut all security software as soon as the infection takes control, which is pretty trivial using various privilege escalation tricks.

    Security should run in a separate hardware (an AV card plugged-in besides the VGA vard) if we are to prevent such occurances.

    [Edited for brevity]

  2. Olaf · 697 days ago


    this article reminds me of a problem, that we quite oftenly realize here with our Sophos installation: On our computers we perform a weekly full scan of the local harddisk. Occasionally this scan finds a virus, probably due to improvements of the virus definitions we got from Sophos in the meantime. So obviously the virus has been already installed and perhaps also active before Sophos antivirus detects it.
    In such a case I would appreciate a detailed information on the nature of the malware. For example, I'd like to know, whether it contains a keylogger or an uploader of Firefox saved passwords, so that the user needs to change all his passwords. Perhaps it was even a trojan, that allowed the attacker to inspect other victims in our intranet...

    So, more information on the threat might help us to determine, what is necessary to react on the detection (despite of cleaning it or reinstalling the operating system).

    What do you think about this?

    • Maxim Weinstein · 696 days ago

      Hi Olaf,

      SophosLabs uses big data systems to identify and block malware automatically. These systems are optimized for providing the quickest and most accurate protection. They also sometimes detect malware based on sets of malicious characteristics or "genotypes" that span multiple malware families. Unfortunately, this sometimes means that the detections don't produce the most specific, detailed threat descriptions. As we continue to evolve our systems, we intend to put additional effort into identifying and communicating details that will help you better understand the threats that have been detected.


    • John Hawes · 695 days ago

      As Maxim says, thanks to generic detection techniques, these days most malware detections won't come with a nice clear name exactly identifying what's been spotted, you're more likely to get something vague and not too helpful.

      If you do have a name that looks specific, you can try looking that up in the AV vendor's threat database, but there may not be much detail there - no-one can provide full info on every single threat, so only the most important tend to be well described.

      There are some other options though, if you're willing to do a bit of digging. First up you could try looking up the detection name in VGrep (a tool provided by VB in collaboration with - just google vgrep). This will let you cross-reference the name against other people's detections of similar items, so you can then go check their threat lists to see if they have more details. Again, this relies on it being a fairly specific name I'm afraid - if you just put in "Generic.Trojan" you're likely to get thousands of hits.

      If that's no use, and you have kept a copy of the sample file (always handle suspected malware samples with extreme caution), you can try submitting it to VirusTotal (part of the Google empire these days) to see what other people are calling that exact sample, and again try your luck in the threat description lists to see if any are sufficiently detailed for your needs.

      The next option would be to try submitting to one of the public sandbox systems - these let you upload a file, which is then run inside a secure environment, and they produce a report on what they saw it doing. This might show you what dangers you might have been exposed to, although the reports can require some technical knowledge to understand properly.

      You can find links to some of these tools on the VB links page - google "vb malware identification" (may need some updating - my bad).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

About the author

John Hawes is Chief of Operations at Virus Bulletin, running independent anti-malware testing there since 2006. With over a decade of experience testing security products, John was elected to the board of directors of the Anti-Malware Testing Standards Organisation (AMTSO) in 2011.