Several blog postings over the past few months have described web attacks that SophosLabs have identified (1,2,3,4). A lot of such attacks involve compromised sites - legitimate web pages that are modified by the hackers such that malicious content from a remote server is loaded when the legitimate page is browsed. The malicious content on those remote servers (which we will call 'attack sites') is increasingly created with publicly available kits such as MPack or IcePack (and various copycats). For those interested, an overview of the types of web attacks we see is presented in a recently published technical paper.
One of the exciting things about web attacks is the speed with which they move. In cases where high profile sites are compromised, by the time you hear of it, the site has often been cleaned up. Even in less prominent attacks, the attack site used may well get taken down long before the many compromised sites are cleaned up (if they ever are). The job of the researcher goes beyond adding detection for one file or another used in the attack. Harvesting of all the components is essential if we are to provide optimum protection (5). Collecting data about all the files, URLs and exploits used in an attack helps us to ensure:
- we detect the relevant components
- we block the relevant components (URL filtering via WS1000)
- our pro-active protection abilities (e.g. HIPs and BOPs protection) keep up to date with new threats
Collecting this information is non-trivial. The bulk of malicious scripts are obfuscated, and the web is a large 'place'. Lack of visibility into attacks is something that has frustrated me for a while now. Quoting numbers of detections together with geographical location is useful, but does not give you information about specific attacks, and their purpose. So, I set about trying to automate the process of analyzing scripts and extracting relevant information. One of the requirements was to dynamically generate flowcharts to give a visual representation of how an attack works. Results look promising, and really give an insight into attacks and their purpose.
To give an illustration of this I have taken 3 examples, all extracted from data harvested in a 24 hour snapshot yesterday. (Click on any of the images to see a larger image.)
1. Dorf ecard attack
The data shows a single machine serving up a malicious script (detected as Troj/JSXor-Gen) which delivers various exploits in order to infect the victim with Mal/Dorf-E. This fits into what we know about the ecard attacks - single compromised machines serving as infection points.
2. Single attack site serving up downloader Trojan
This attack bears similarities to the previous in that a single machine, again using a malicious script detected as Troj/JSXor-Gen, attacks the victim with exploits intended to install malware. In this case, the malware is members of a notorious family of downloader Trojans which download and install other malware. So, we have a partial picture, but not the complete picture - for that we need to link the system up with our other replication systems so we get insight into what Clagger downloads.
3. Distributed attack
Be sure to expand the large image as well - most browsers will try to shrink the very wide image (Firefox: click on the image, IE: hover over image and click on the 'full size' button that appears).
This is a good example of an attack using compromised sites to 'guide' victims to the attack site, from where they are hit with exploits and infected with malware. In this case, numerous legitimate sites are compromised with a malicious script (detected as Mal/ObfJS-C) which writes an iframe to the page in order to load content from an attack site. This in turn loads content from another site, which loads up various pages that each intend to exploit the victim browser to install malware (in this case pro-actively detected as Mal/Heuri-E). Interestingly, some of the compromised sites are detected as something other than Mal/ObfJS-C, due to the site being multiply compromised!
Using flowcharts like those above really helps to explain how web attacks work. More importantly, the system also enables us to ensure we have protection in place for the various components of an attack, and, if not, escalate appropriately. Where we pro-actively detect the final Trojan installed in the attack, the detections are predominantly behavioral genotype. This is not surprising - effective generic detection technology is required to combat the aggressive use of server-side automation that is used in these attacks to frequently update/re-encrypt/re-pack the malicious files.