How a Network Snafu in Indonesia Blocked Google in California

You know the story about a butterfly flapping its wings in Brazil and causing a tornado in Texas? Well, it looks like an error at an internet service provider in Indonesia prevented as much as 5 percent of the world’s internet users from accessing Google for about 27 minutes on Monday evening California time.

That’s the word from Tom Paseka, a network engineer with the California-based internet security company Cloudflare, who discussed the snafu in a short blog post on Monday. Paseka says at around 6:24 p.m. PDT, he noticed that he his California co-workers were unable to access Google services — and that there were many complaints on Twitter from users around the world who were having similar issue. After a little digging, he discovered that the problem could be traced not to Google but to an Indonesian ISP called Moratel, and the way he tells it, a quick call to the ISP set things right.

The problem was short-lived, but it shows how tenuous internet access can be. There’s a reason it’s called the internet. Our worldwide network is a collection of disparate services controlled by countless companies, governments, and individuals, and a problem with one system can have a knock-on effect with others.

As Paseka explains it, the internet is a collection of networks called autonomous systems. Autonomous systems communicate with each other through the Border Gateway Protocol, or BGP, which is a system for exchanging information about routes from one location in the network to another. If you want to access Google, your ISP needs to have a route from your computer to Google’s servers.

“BGP is largely a trust-based system,” he writes. “Networks trust each other to say which IP addresses and other networks are behind them. When you send a packet or make a request across the network, your ISP connects to its upstream providers or peers and finds the shortest path from your ISP to the destination network.”

What happened is that Moratel began providing incorrect routing information to its upstream provider PCCW, which passed the bad information along to its peers. Even though Google’s servers were online and available, many users — mostly in Hong Kong, Paseka speculates — were unable to access Google’s services because their ISPs had the wrong routing information.

When Paseka figured out what was going on he contacted a colleague at Moratel who was able to fix the error. In no time, Google was accessible again. “I’m sure Google monitored and were investigating the issue, but usually back channels work faster than official paths,” Paseka writes in the comments of his post.

“It is unlikely this was malicious, but rather a misconfiguaration (sic) or an error evidencing some of the failings in the BGP Trust model,” he writes.

As Paseka notes, this isn’t the first time something like this has happened. Internet monitoring company Renesys has tracked these sorts of disruptions on its blog for years, and they usually happen when a customer like Moratel accidentally passes bad info to a provider like PCCW. Pakistan Telecom broke YouTube for many users, even outside of Pakistan, in 2008. There was a big disruption caused by a Czech ISP in 2009, and smaller one caused by China Telecom in 2010.

With these things happening about once a year, why hasn’t anyone fixed the system? Isn’t there wide possibility for abuse? Renesys CTO Jim Cowie says the threats of route hijacking or leakage are serious enough that companies need to be aware of the issue, but that the effects are generally short-term as the problems tend to be found and fixed fairly quickly.

He says that although various solutions have been proposed, the process of changing over to an alternate system is difficult, even if an adequately resilient alternative were designed. “There are more than 45,000 autonomous systems in the internet, and they have to independently make up their minds that this is worth doing,” he says. “If the costs of compliance are nontrivial, and the individual benefits are hard to quantify until the protocol is globally deployed, it’s going to take a long, long time to reach critical mass.”

“There’s also some concern over any scheme that puts the power of route validation in some external authority’s hands, no matter how trustworthy they are.”

CloudFlare’s Terry Rodery agrees that change will be long and difficult. He thinks the problem will largely be dealt with through self-policing on the part of ISPs.

“This will continue to be an issue until pressure is applied to these ISPs by their peers to implement customer filtering,” he says. “Peering policies of most ISPs require that proper route filtering be put into place for each customer. Those who don’t follow this practice are likely to get rejected for peering or de-peered.”

Leave a Reply

Your email address will not be published. Required fields are marked *