Gmail introduces filters for non-Latin characters, weeding out more phishing emails

Gmail introduces filters for non-Latin characters, weeding out more phishing emails

Fish courtesy of Shutterstock Just one week after Google announced that it was to become the first major email provider to adopt the Internet Engineering Task Force (IETF) standard for addresses containing non-Latin and accented characters, it has had to introduce filters to minimise the risks posed by the change.

While many will see the IETF standard as a positive step that allows people around the world to better represent their names via the use of accents and other regional symbols, it is not without its problems.

As Mark Risher of Google’s spam and abuse team explains in a post on the company’s online security blog:

Scammers can exploit the fact that ဝ, ૦, and ο look nearly identical to the letter o, and by mixing and matching them, they can hoodwink unsuspecting victims. Can you imagine the risk of clicking "ShဝppingSite" vs. "ShoppingSite" or "MyBank" vs. "MyBɑnk"?

(Ritter explains that the three characters above are a Myanmar letter Wa (U+101D), a Gujarati digit zero (U+AE6) and a Greek small letter omicron (U+03BF), followed by the ASCII letter ‘o’.)

By using the Gujarati zero, for example, it would have been possible for a spammer to send an email from supp૦rt@g૦૦gle.c૦m that would, to most eyes, have appeared to have originated from Google itself. Getting such an email past a spam filter could have proven highly lucrative if the right hook was employed.

For that reason, Google is adding new spam filtering based upon rules put together by the Unicode Consortium.

By using “Restriction Level Detection”, which it believes strikes a balance between legitimate and abusive use of new domains, Google should be able to protect Gmail users from obfuscated email addresses that use lookalike non-Latin characters by using a rule that states:

The authenticating domain, envelope From domain, payload From domain, reply-to domain, and sender domain should not violate the highly restrictive Unicode Security Profile guidelines for international domain names.

The newly-employed “highly restricted” standard states that all characters in a string should be made up of the same script, unless they come from these combinations:

  • Latin + Han + Hiragana + Katakana
  • Latin + Han + Bopomofo
  • Latin + Han + Hangul

Google rolled the changes out on Tuesday, saying that it hopes other email providers will soon follow suit.

Together, we can help ensure that international domains continue to flourish, allowing both users and businesses to have a tête-à-tête in the language of their choosing.

Image of fish courtesy of Shutterstock.