Domain flub leaves 30 million customers high and dry

The CEO of cloud software and services company Zoho was left begging Twitter users for help on Monday after his domain registrar effectively took the company offline, stranding millions of users.

The drama started at 8:22am PCT, when’s founder and CEO Sridhar Vembu took to Twitter with a complaint about Zoho’s domain registrar, TierraNet. The company had taken Zoho’s domain down and he couldn’t reach senior management to get it reinstated.

A domain registrar is the company that reserves a domain name for a client to use on the internet, and then keeps that record alive so that it continues to resolve to an actual IP address.

If a domain registrar decides to take down that domain name, it effectively removes the client’s online address from the domain name system (DNS), which is the web’s address book. That means that when you type the domain into your browser, you get a bad request error rather than seeing their website. Their computers may still be running, but you can’t reach them.

For Zoho, this was a big deal. The company is huge. It has 30 million users, and 5,000 employees worldwide. It provides cloud-based software solutions ranging from email to CRM, invoicing, IT and helpdesk software. Its customers range from HP to Hyatt Hotels. So when its site is not available, people notice. The complaints began appearing:

When Zoho customers heeded Vembu’s online complaint and began complaining to TierraNet, the registrar told them that it had taken down Zoho’s domain after receiving complaints of phishing attacks using Zoho’s email service.

In messages to Zoho users who were complaining about the outage, TierraNet’s support staff said that they had tried to contact the company to no avail. Vembu responded that the company had received three complaints, and had only one investigation pending.

In a blog post explaining the incident, Vembu alleged that this was the result of an automated script rather than a human decision, calling out TierraNet for not consulting further with it.

Somehow this automated algorithm decided to shut down the Zoho domain based on these 3 cases – without prior warning of the shutdown, or investigation into the traffic supported by this domain.

While Vembu has been actively apologizing to customers and calling out TierraNet on Twitter, the domain registrar did not reciprocate on social media. Its own Twitter account was last updated almost a year ago. It consists mostly of messages acknowledging its own service outages from 2015 and 2017.

Zoho quickly switched to CloudFlare as its domain registrar, which enabled it to get its domain re-listed in DNS records. However, because the computers that hold DNS records will cache those records for a period of time to cut down on the number of requests they must make, it took some time for the new entry to propagate throughout the DNS system. The lag led to complaints like this one.

It also meant that Vembu found himself, the cofounder and CEO of a massive online services company, giving tech advice online to help get the message out from Zoho’s official support team. He was explaining how to change DNS servers to point to Cloudflare and avoid the propagation lag.

Customers seemed pretty understanding on the whole, and many praised Vembu for owning the situation and being transparent on Twitter.

Nevertheless, it begs the question: why was the CEO of a massive online software company reduced to public begging messages asking someone to help him get through to the provider that effectively controls his entire online presence?

Why was there not at least a backup domain that Zoho could have posted on Twitter while it resolved the issue? As Vembu said in one of his many contrite tweets, the company learned a valuable lesson on Monday.

He’s acting on it, though, taking steps to make sure that he isn’t caught out again. He concluded in his blog:

You have my assurance that nothing like this will ever happen again. We will not let our fate be determined by the automated algorithms of others. We will be a domain registrar ourselves.

That would go some way towards closing what was a wide-open hole in the company’s risk strategy, and should send other online companies running to check for single points of failure in their own setups.