How much malware does SophosLabs detect?

To infinity and beyond

SophosLabs has discovered a technique in anti-virus marketing, which we detect as Spin/BigNumber-P. Typical behaviour involves phrases such as “Product detects X viruses!”, where X is a large, rather exact-sounding number. Some variants involve high-tech numerical displays updated in real-time with ever growing numbers. This technique has been spotted in the wild.

Never one to be left out, SophosLabs would now like to publish the number of malicious files we detect:


Yes, that’s right: We currently detect an infinite number of malicious files. While that shouldn’t surprise those familiar with SophosLabs, let me explain.

Talking about a specific number in relation to total malicious file detections reveals a misunderstanding of how malware and malware detection operate. The vast majority of threats we see are polymorphic, meaning we see many variations of each threat. Some are modified by the malware authors, others are generated by server-side programs and others modify themselves as they spread. And then there are file infecting viruses, which potentially modify any clean file on a system into a malicious one. An infinite number of threats.

When a quick response is required, our analysts and automated systems can block a specific file. But the bulk of our protection comes from generic detection which looks for characteristics of known malware, rather than an exact match. Just one such identity might detect hundreds, thousands, or an infinite number of variants.

Let me be clear: There are no practical limits on the number of different files we can detect, nor the number of identities our product can handle. If we were relying solely on exact files matches using checksums, we might quickly run into performance issues and memory limits, or restrict detection to only the most active threats (a practice followed by some other vendors). Instead, we maintain a multi-layered detection framework based on static characteristics and/or run-time behaviours.

Even with such an impressive detection number, we’re not about to rest on our laurels. In fact, by this time tomorrow, we will detect an even larger infinite number* of files. Larger in the sense that we will have written detection for infinitely many new pieces of malware in addition to still detecting everything we detect today. If that seems counter intuitive, consider an example of there being infinitely many even numbers (2,4,6,…) but also infinitely many odd numbers (1,3,5,…) between them. With infinity there is always room for more. See Hilbert’s paradox of the Grand Hotel for an example.

Too theoretical? Consider Troj/VB-EUH which, when run, creates about 100 variations of itself on the host system. Running any of these variants on a new system will create 100 more variations. It’s easy to see an automated system left running could quickly create hundreds of thousands of new, malicious files which would all be detected at Troj/VB-EUH.

Hmmm, maybe it is time to get our own real-time counter.

* Experts in pure mathematics and set theory will know that actually it is the same number, Aleph Null, the countable infinity. Those wishing for us to go further and detect an uncountable number of malicious files will have to wait for the release of Sophos Quantum Anti-virus.