Browsing the BBC website this morning I came across this reference to how many malicious code threats are out there. Apparently the number is 1,122,311. Now that's a pretty big number by anyone's standards but this is the number officially quoted in a competitor's 6 monthly threat report. These reports always contain lots of useful information and the latest Sophos threat report will be published very soon and you will be able to see our take on the security landscape.
However, what caught my attention about this number was the accuracy of it. Just how do these guys know exactly how many threats are out there? Here am I, the manager of the largest threat lab that Sophos has and guess what - I don't actually know how many threats are out there.
Let's put the malware threat into perspective. We all know that the numbers are increasing astronomically. When I first joined Sophos in September 1997 the number of known threats was 11,950. I have the records of what the product has stated going back almost to the start of the company. In those days we received a handful of samples each day from a few different sources.
Looking at the figures for last week shows that we received approximately 114,000 unique files from different sources (excluding customers). Some of these files we will already have detected and the rest are going to be a mix of malware, PUAs, clean files and general tat.
This week alone one of our sources of files has started sending us approximately 22,000 different files each day. This actually amounts to over 7GB of files every day. Again they will be a complete mix of files that we need to process.
Yesterday, on AVIEWS, there was a report of a new piece of malware doing the rounds. The report stated that Sophos already detected the file but as the report contained the MD5 of the file I ran this through our files database and it showed that we did not have the actual file although we already detected it.
The point here is that we are receiving all these files for analysis but there are many more out there that we already detect, principally through behavioral genotypes, that we don't receive. If we already detect them then often they do not get sent to us.
Recently we created a new positive testing system. This system tests our product against all the files that we detect. It runs every few hours and is aimed as ensuring that we never lose detection for a file. Detection is much more complicated than just signatures and we don't want to risk losing detection through an unwanted side effect. This system currently has over 4.6 million unique files on it and is being added to every day. These will not necessarily be individual malware variants - there will be plently of replicants of viruses in there.
What is the purpose of all these numbers? It is simply to say that the threat is enormous but it is difficult to see how any person or company can say exactly how many threats are out there. Yes, Sophos still puts into the product every month a virus count. To be honest that is done because there is a bit of code that requires there to be a number there and, whilst we try to have the number make sense, I would take it with a pinch of salt.
In fact, thinking about the growth of threats in general I might even be tempted to start a sweepstake on which AV company is going to be the first to claim 10 million threats and when they are going to claim it. Then I will ask them to prove it.