Paul Ducklin takes an in-depth look at the scale and the risk of the typosquatting industry: registering mis-spellings of popular domains in an attempt to profit from typing mistakes.
Applying every possible one-character typo to the domain names of Facebook, Google, Twitter, Microsoft, Apple and Sophos, Ducklin collected HTTP data and browser screenshots from 1502 web sites and 14,495 URLs.
In this report, he analyses the data to paint a fascinating picture of the typosquatting ecosystem, finding surprisingly little malware, but nevertheless plenty of risk.
A Naked Security reader recently asked us to investigate the scale and the risk of typosquatting, after she accidentally put herself in harm’s way by mistyping a popular URL.
She meant to visit posterous.com, but typed the linguistically-similar posterious.com by mistake. She was immediately and automatically deviated to a site which was blocked by Sophos Endpoint Security because it contained malware. Indeed, posterious.com redirects at the whim of its operator, taking you to different sites each time you visit.
As you can see below, posterious.com took us to a product comparison site, an online coupon site and then to a generic search site commonly seen on typosquat domains.
Typosquatters register mis-spellings of popular domains in the hope that they will be able to make money out of traffic from unintentional typing mistakes, or fat-finger errors, made by internet surfers.
So, how bad is typosquatting? What sort of risk do fat-fingers pose?
We decided to find out.
We chose six domains: Facebook, Google, Twitter, Microsoft, Apple and, while we were about it, Sophos.
To keep things simple but representative, we limited ourselves to typos of one alphabetic character in the company name: one letter omitted, one letter mistyped, or one letter added. Typos involving numbers or punctuation marks were ignored.
We generated all possible one-character mistakes in the http://www.companyname.com form of the above six domains. That produced 2249 unique site names, from http://www.pple.com, through http://www.facemook.com, to http://www.twitterz.com.
Of course, a few of these generated names are meaningful in their own right. The domain http://www.racebook.com, for example, sounds like a betting site, and it is. Goole.com is a site about Goole, a large port on the East coast of England. And witter.com is a site owned by an American called Glen Witter.
The scale of typosquatting
Not all of the possible one-character-wrong names for any domain will be registered and in use. To get an idea of how many typosquats to expect for a business domain, we started with the mutated versions of http://www.sophos.com.
Although more than 100 million users worldwide are protected by Sophos, we don’t have consumer products, an online music store, webmail, a search engine or a social network. So we don’t have millions of users attempting to type in and visit our URL each day. This suggests that http://www.sophos.com ought to be representative of a lightly-squatted part of the domain name space.
Sophos does have squatters hoping for occasional search traffic or for the chance to sell on a likely domain name, but only a few. Using the Domain Name System (DNS) to verify registered and resolvable domains from our machine-generated list, we came up with a ratio of 56 out of 333, or 16%.
Things were quite different for the other brands in our survey.
Microsoft typosquats were at 61%, Twitter 74%, Facebook 81%, Google 83% and Apple at 86%. Clearly, there is a significant typosquatting ecosystem around high-profile, often-typed domain names.
Grabbing the data
Altogether, two-thirds of our 2249 possible typosquat domains (1502 in total) resolved using DNS. With automation scripts written in a combination of Bash, Python, Lua and Applescript – a weirdly wonderful tool for controlling OS X applications – we browsed to each site in Safari, exactly as if a user had typed in the fully-qualified domain name in the address bar.
We relaunched Safari 5.1.1 and performed a browser reset before visiting each site. This ensured that there were no cookies, cached files or other browser history to influence the results.
Using a custom-written web proxy, we recorded all URLs and complete HTTP traffic for every visit. We also took a screen snapshot of the resulting web page after Safari had been in action for nine seconds.
Then we crunched the data to see what we could learn about the typosquatting industry.
Our first, and happy, surprise was that we weren’t overrun with malware.
We recently surveyed a batch of lost USB keys bought from a transit authority’s Lost Property auction; we hoped that the infection rate would be about 10%, but found that 66% of the keys in our study were infected.
So we naively assumed that typosquat sites would be similarly incautious (either by accident or design) about malware. But out of 14,495 URLs downloaded in browsing to the 1502 sites on our list, only one contained malware. That’s just 0.01% by URL, and 0.07% by fully-qualified domain name.
With hindsight, however, this is not surprising.
There are only so many plausible ways to mis-spell words like Facebook and Sophos, so typosquatters have an interest in avoiding malware. Unlike fake viagra sellers or scareware peddlers, typosquatters can’t just get up and move on if one of their domains gets an unarguably malicious reputation.
Our second observation was that, despite the absence of malware, typosquats are by no means harmless.
We looked up each of the 14,495 URLs in our saved traffic using SophosLabs data. (This is the same detection and classification that users of Sophos Web Security and Control enjoy.)
384 of the URLs (2.7%) downloaded when visiting a typosquat site fell into the loose category of cybercrime. That means they are, or have been, associated with hacking, phishing, online fraud or spamming. And 354 of the URLs (2.4%) were adult or dating sites.
Even if you tolerate adult sites yourself, you don’t want to expose your workplace or your children to them. Typosquatters have no such qualms.
The other categories highlighted in our breakdown give a high-level insight into the typosquatting ecosystem. Unsurprisingly, 15% of the URLs were tagged as advertising sites and popups; 12% related to IT and hosting, representing the large number of typosquats which offer to sell on a possibly-interesting domain name (domain parking, as it is often called) whilst mopping up undeserved click revenue; and 6% were classified as search sites.
Locating the squatters
The third issue we examined was the location of the servers hosting the typosquatting URLs.
As you might expect, the USA topped the list, hosting nearly two-thirds of the servers; Germany, China and the UK came in the next three spots. The British Virgin Islands (population approximately 30,000) and the Cayman Islands (about 60,000), which are offshore financial centres, made it into the top dozen.
Our fourth exercise was to identify and comment upon any prominent or interesting subcultures in the typosquatting community.
Ignoring the 354 adult and dating sites, which make up 2.4% of the URLs in our list, several categories attracted our attention. Many typosquat pages fall into more than one category:
- Domain parking and domains for sale
- “Related search” pages
- Competitions and surveys
- Passing off
- Oddball humour and satire
- Fellow typosquatting researchers
Several domains-for-sale providers popped up across all six of the domains we analysed, from Apple to Twitter. The main player in managing page content for typosquat domains, including the “related search” links on typosquat pages, is Google’s DoubleClick subsidiary.
More than 560 of the 1502 pages (37%) in our test made use of DoubleClick, which serves numerous domain parking businesses, including Bodis, Oversee, Sedo and Demand Media. You’ll probably recognise the look of parked domains from these companies, as they pop up all over the internet, not just on typosquatting sites.
The most overt bait-and-switch operation in our study abused the Apple and iTunes brands, squatting on 52 of the 241 Apple-related domains (22%) in our list, from http://www.abpple.com to http://www.applze.com. These typosquat sites all redirect to a pair of domains named live-online-istore and mp3helpdesk.
The company behind this bait-and-switch is registered in Jersey in the Channel Islands, has an accommodation address in Harley Street, London, and operates its servers out of Canada.
The trick is simple. If you fat-finger http://www.apple.com and end up at the live-online-istore site, you see an Apple-like page.
This page appears to offer you iTunes software downloads for Windows and Mac. The “Download iTunes” button is the bait.
There is no iTunes download. If you click the button, you are whisked off to the mp3helpdesk site, which now claims to be offering you “unlimited downloads for just 0.99 a month”.
In truth, what you are paying for is just access to technical help forums for a selection of free software for file sharing and for playing audio and video files.
The small print does, in fact, tell you this, but you would be forgiven for not spotting it. “Unlimited downloads” merely refers to the plethora of peer-to-peer files, legal and illegal, already available for free online.
Other brand misuse amongst our samples involved directly passing off the typosquat domain as the real thing. Google was the most commonly-abused brand, since it is trivial for a third-party site to present a Google-like search page and to use Google’s search engine behind the scenes.
This sort of brand abuse can generate revenue in several ways.
By presenting sponsored links as organic search results, the fake site earns click-through revenue more readily. By mixing other revenue-generating links into real search results, the brand abusers can hide their inorganic and even unrelated links amongst otherwise-high-quality results.
Of course, by visually presenting its so-called search engine as a well-known brand, the fake site doesn’t even look like a typosquat.
The lighter side
It wasn’t all doom, gloom and repetition amongst our typosquatters, however.
We found occasional humour and satire, as shown above, and on two of the sites, we came across fellow researchers in the typosquatting field, as shown below.
But there were still plenty of risky URLs to which our browser was exposed simply by starting with a typosquat domain.
Of the 14,495 URLs in our downloaded collection, 738 (5.1%) were categorised by SophosLabs as cybercrime or adult. The former should always be blocked; the latter should be blocked at least in the workplace or around children.
Of course, even if you take technological precautions, it is almost inevitable that you will end up on an unintended website from time to time. That’s because the scale of the typosquatting industry is just so large: over 80% of all possible one-character variants of the domains of Facebook, Google and Apple are both registered and resolved.
If you find yourself somewhere you didn’t intend due to a fat-finger error, don’t be tempted to click through from the unexpected page, even if what you are apparently offered is a link to your intended destination.
At the very best, typosquats which lead to parked domains are just aiming to make money out of nothing, by capitalising on your errors.
At worst, typosquatters are trying to give you a false sense of safety, with the intention of misleading you further into unintended and possibly risky online actions.
Why trust a site you didn’t want to visit in the first place? Why feed an economy which is based upon profiting from other people’s mistakes?