Note: In the interest of open disclosure, I am senior malware researcher at SophosLabs with a keen interest in improving testing procedures and a founding member of the Anti-Malware Testing Standards Organisation (AMTSO).
The latest anti-malware tests, performed by Simon Edwards for Dennis Technology Labs, show that comparative testing can actually be a strong indicator of how well a security solution can protect a user.
Conducting these types of tests is not easy or even straight-forward, but Dennis Technology showed that it is indeed possible.
The internet is full of so-called testing bodies who scan their malware collections with a handful of on-demand scanners and publish the rankings.
Measuring on-demand detection rates provides a performance indicator, but it doesn’t actually tell users what they want to know: are their systems as safe as possible from the glut of malicious attacks out there?
Full product testing introduces what is known as a ‘malicious attack scenario’ to a victim computer.
Good tests try to emulate a known attack in a real-life setting, with the aim of recording the protective measures employed by the security software to stop the attack at the earliest stage. The ultimate goal is to assess how well the solution protected the victim system from attack.
These types of tests need to take into account system setup, user behaviour, installation configurations, etc. It is a complicated set of parameters to get right.
To illustrate this point, let’s look at web attacks using the most successful exploit kit we have seen in the last few years, the Blackhole exploit kit. There are a number of stages in the attack, and today’s multi-layered security solutions should attempt to prevent infection at each stage.
Stages of a web attack using the Blackhole exploit kit
First, this type of malicious attack is usually initiated via a simple web link. It can be delivered via email, drive-by exploit or in browser search results (a result of search engine poisoning).
In the case of email distribution, spam filters are present to filter out unwanted and malicious messages. With drive-by or search engine poisoning attacks, web filtering is there to catch the iFrames pointing to known URL patterns and should raise an alert.
Second, the attack usually employs simple redirector scripts. These are often encrypted, which alerts malicious script detection. Should the redirector scripts not be encrypted, URL filtering and reputation defences are designed to alert the user or administrator to suspicious activities.
These redirections point to a hosting server, which, in the case of the Blackhole exploit kit, features a “traffic direction system” (TDS). Collecting browser and OS version information, the TDS returns a collection of exploits tailored to the specific vulnerabilities present in the victim environment.
Third, exploits are delivered to the system under attack. Delivered content may contain simple VBScript downloaders, PDF, Flash or Java exploits, along with some rarely used Windows vulnerabilities. This malicious content is reliably detected by the up-to-date exploit prevention modules, on-access scanners, or content filtering components.
Fourth, should the exploits be successful, the victim machine reconnects to the hosting server for the binary payload, which is subsequently downloaded and executed.
And it is at this fourth stage where most real-time protection tests start their assessment; at the point where the URL points to the executable content. Attack stages one through three are rarely, if ever, considered by most real-time protection tests.
This shortcoming in tests was highlighted at a 2007 International Antivirus Testing Workshop. Discussion on this topic conceived the birth of the Anti-Malware Testing Standards Organization (AMTSO) the following year.
To improve the quality of testing – and to test in a way that truly mimicked a real-time user experience – seemed to be mission impossible at that time. Fortunately, the combined forces of testers and anti-virus experts created some very useful guidelines and best practices.
Dennis Labs participated actively from the very early days of AMTSO, working on documents like whole product testing guidelines. It is good to see that they were not interested only in theory, but adopted these guidelines in practice as well.
You can read Dennis Labs’s self-imposed guidelines to testing web threats, but here are the highlights:
- They do not take sample feeds from vendors
- They try to always use URLs containing exploits
- They may include social-engineering attacks
- They use complex samples.
Of course, for a multi-layered, anti-malware solution developer, the number one priority is the safety of its users’ systems.
People in the market for new security approaches look to independent tests as a guide.
And, yes, all vendors want to do well on tests, but they are so much more valuable if tests are properly executed and look at the whole security solution, rather than just some of the components.
Obvious question – are the results in? How did Sophos do?
Sophos I believe is covered is the Small Business AV section listed here http://dennistechnologylabs.com/reports/s/a-m/201…
It would have been really useful if they'd tested Sophos as well. Just saying.
They did test Sophos too. 🙂
Great resource, thanks.
I know there is always some clown who will respond with… "why wasn't such and such included". And I also realize that not every AV can be included. But seriously, no Avast free?
Considering Avast free is generally recognized as being the most utilized (popular) anti-virus worldwide, it does seem a rather surprising omission.
I have used sophos antimalware but found it somewhat interferes with some programs that I know are safe.however generally sophos antimalware tends to be one of the best for finding and removing the maleware. So I am happy with it..
I think too many people place too much emphasis on anti-malware tests. I've noticed a lack of consistency by the testing organizations themselves. For example, Vendor A's product is tested this month on a computer running Windows 7. It scores in the high 90s and is rated #1 by that testing organization. People jump on the bandwagon extolling Vendor A's product and people rush out to buy it.
But at the same time, Vendor A's product is tested by another company on a machine running Windows XP and scores in the low 90s. This is not comparative testing.
To add to the confusion, the testing company that scored Vendor A's product in the high 90s this month has a totally different result when they test Vendor A's product the following month. No wonder there will never be a unanimous decision on which anti-malware product is best.
Maybe this is too simplistic an approach, but I don't think any anti-malware product will be 100% effective 100% of the time. If there was such a product we'd all be using it. Therefore, I think that using any anti-malware product is better than using no anti-malware product at all.
Perhaps more time needs to be spent educating people not to visit dodgy websites, opening emails from Nigerian royalty, or clicking on naked pictures of Justin Bieber, Mila Kunis, et al.
When Vendor A has different scores in two test performed around the same time by two different testers, then it may well be because the testers are following different methodology. For example one is executing an on-demand test, the other a live protection test. There must be a difference in the test sets as well. It is for sure that two independent testers will not use the same testset (though there may be a large overlap). Depending on their selection, the results of the same vendor will differ.
When Vendor A has different scores with the same tester in subsequent tests, the most likely reason is that the tester changed the testset (as they should), and the new samples in circulation in the given period are more or less effectively detected by the product.
To interpret the test results, you have to understand what the test was about. The final score of the test is not much of use without it.
Where's Zone Alarm?
Not tested, possibly due to time or resource limitations. The test lab selected only 8 of the possible 30+ products in each category.
A few metrics I'd like to see when these effectiveness studies are performed are the computing, bandwidth, and user experience costs: how hard the PC ran during each test, how much extra bandwidth was used, and how the user's daily actions were negatively affected by the use or presence of the security software.