How large is a piece of Malware?

Q. What is the average size of a typical malware file?

Of course there is no definitive answer to this question, and different kinds of malware can have vastly different sizes, but for those wanting an answer I ran a quick calculation over some of SophosLabs’ monthly collections of malware samples.

In January 2005 the average size of a malware sample was 126 kB. In June 2010 it is 338 kB.

This growth in size is pretty much what one would expect, and can be for several reasons. Long gone are the days of hand crafted assembler code designed to be as small as possible. As computer memory, disk space and internet bandwidth grow, so does the output of a typical compiler. Software libraries become larger, and software (both legitimate and malicious) tends to contain increasing amounts of complexity and functionality.

Q. Can you give some examples for specific kinds of malware?

Troj/JSRedir-BV is an obfuscated Javascript, typically seen attached to spam email messages. If the attachment is opened the web browser will be redirected to a scam web site. Such redirection could be done in one line of Javascript, but due to the heavy obfuscation used a Troj/JSRedir-BV script is typically 3 kB to 5 kB in size.

Mal/Dloadr-Y is a downloading Trojan with functionality to change firewall settings, download a configuration file from a remote website, then download further malware as dictated by the configuration file. Samples of Mal/Dloadr-Y are typically 25 kB to 30 kB in size.

FakeAV Trojans are rogue anti-virus applications that display fake infection warnings to try and scare users into paying for cleanup. There are many different families of FakeAV, and even within a family there can be a large variation in size. For example, samples of Mal/FakeAV-DO range from about 300 kB to over 1 MB. These variations are partly because FakeAV authors frequently change packing or encryption techniques. Furthermore, in some cases each sample contains random amounts of junk data in an attempt to evade detection.

Viruses, although often relatively small in themselves, can infect legitimate applications of any size. For example, a typical variant of W32/Scribble-B contains about 20 kB of viral code, but infected applications can be just a few kilobytes or many megabytes in size.

W32/Scribble-B also injects a malicious iframe into htm, php and asp files. The iframe is just one line of html (about 80 bytes) but the infected web pages can be of any size. However, the iframe is always added at the end of the file, so it is easy to find and is detected as Troj/Fujif-Gen.

Q: As Malware gets larger, does Sophos’ scanning get slower?

From a customer point of view, this is the wrong question. Whilst SophosLabs has an ever increasing collection of malware (and increasingly powerful hardware to extract and analyze lots of data from it) the existence of malware on a customer machine should be a pretty rare thing. If the virus engine spends a few milliseconds identifying a malicious file that is no big deal. What matters is that it scans over a typical clean file in not milliseconds but microseconds. So the real question is: as legitimate software gets larger does SAV get slower?

Actually, individual file size has very little impact on Sophos’ scanning speed. Here in the labs we put a great deal of thought into optimizing the performance of our detection identities. Instead of linearly scanning through whole files for fixed patterns, each identity targets only those parts of the file where it needs to look.

To take an analogy, suppose you have misplaced your cell phone. Rather than starting at one end of the house, and slowly working your way to the other, searching everywhere with a fine comb, you probably stop and think: Where am I most likely to have left it? Where did I last use it? Where have I been since then? There is no need to check the attic if you haven’t been up there all week. Quite quickly you will identify the most important places to look. Even better, if you have access to another phone you can call your cell phone, and listen out for where it is ringing from.

Sophos’ identities use all sorts of shortcut techniques like that. For an executable file, one obvious place to check is the point from which code execution begins. The virus engine automatically loads some of this code, and many identities start by checking it. If it doesn’t match an expected pattern then it doesn’t matter whether the file is 10 kB or 10 MB, many identities don’t need to look any further. Even identities designed to detect such nasties as polymorphic (changes every time, so there is no fixed pattern to look for), mid infecting (viral code is not at the entry point) viruses use a clever combination of emulation and statistical pattern checking to only scan in a few key places.

Q: Is there an upper limit on the size of file SAV scans?

I was quite surprised to learn that some AV scanners have quite stringent limits like this, presumably in order to optimize their scanning performance. Some even have a configurable global setting where you can chose between a low limit (better performance, but risks missing some malware) or a higher one (finds more malware, but slower scanning.)

That is far from ideal. We have already seen how different malware families tend to have different sizes. So in SAV, instead of a global file size limit, each individual identity can (if necessary) specify appropriate limits according to the kind of malware it is trying to detect. As we have already observed, an identity to detect a virus has to scan files of any size, but can be optimized by knowing what to look for and where to look. Meanwhile, many generic identities to detect particular malware families can make use of size optimizations. A typical family of internet banking Trojans might be, say, between 3 and 4 MB. That is just one of several pieces of information that an identity might use to quickly eliminate 99.9% clean files from further scanning. Further investigation will only happen on those files that warrant it.

Image from
(Image from

If we start to see new variants of that family increasing in size then SophosLabs can at any time issue an update with new size ranges. Similarly we can update many other checks to reflect the changes we are seeing. That is the reason why many of our generic detections ask customers to send in samples. Even when we proactively detect a new sample, we want to keep monitoring trends and staying one step ahead of the game.

So Sophos customers do not need to worry about typical size of malware files, nor do they need to worry about setting file size limits. SophosLabs is always monitoring the trends, and making any necessary performance decisions for you.

With the recent launch of SAV 9.5 the labs are getting more data than ever before. Whenever a generic identity detects a file, the size of that file is one of the key pieces of data that can be automatically sent back to SophosLabs. Automatic feedback only happens if customers consent to that option, but we have been very pleased by the number of customers turning it on. Sophos is already a leader in proactive detection, and with this new feedback data we can fine tune that detection to be even better! Thank you for helping us to help you.