Version: 2008

March 3, 2005 4:00 AM PST

Google's secret of success? Dealing with failure

  • 22 comments
Related Stories

Eclipse shines light on future projects

March 1, 2005

(continued from previous page)

can locate copies of a piece of data, such as a keyword index, if the original is out of commission.

"You make the software tolerate failures. If you can expect failures, then this is what makes cheap commodity PCs viable for Internet services," Hoelzle said.

Google's PC servers, which number in the thousands, run a stripped-down version of Linux, which is based on the Red Hat distribution but is really just the operating system kernel modified for Google, he added.

Urs Hoelzle
Urs Hoelzle
VP of operations
and of engineering,
Google

The company has also devised a system for handling massive amounts of data and returning rapid responses to queries. Google splits the Web into millions of pieces, or "shards" in Google tech speak, which are replicated in case of failure.

Not surprisingly, the company creates an index of words that appear on the Web, which it stores as an array of large files. But it also has document servers, which hold copies of Web pages that Google crawls and downloads.

Another important engineering feat done by Google is to make writing programs that run across thousands of servers very straightforward, according to Hoelzle. Normally, building applications to run in a "parallel" configuration of servers requires specialized tools and skills.

Google's programming tool, called MapReduce, which automates the task of recovering a program in case of a failure, is critical to keeping the company's costs down.

"Cost is really the sum of what the equipment you need to do the work costs and how much programming time you need to put into getting something useful," Hoelzle said, adding that Google has started using MapReduce more widely over the past year.

Finally, Google has created "batch" job scheduling software that acts as a sort of taskmaster for millions of operations. Called the Global Work Queue, it breaks up computing jobs into many smaller tasks and distributes them across machines.

For all its built-in redundancy in case of failure, the system doesn't address all problems, Hoelzle revealed. During the presentation, he showed a photo of six fire trucks responding to an emergency at a Google data center in an undisclosed location.

He would not reveal any specific details on the mishap except to say that "it wasn't about one machine going down."

In a follow-up interview with CNET News.com, Hoelzle said the cost of power is another important factor in Google's data center designs.

"The physical cost of operations, excluding people, is directly proportional to power costs," he said. "(Power) becomes a factor in running cheaper operations in a data center. It's not just buying cheaper components but you also have to have an operating expense that makes sense."

Previous page
Page 1 | 2

See more CNET content tagged:
EclipseCon, failure, Google Inc., data center, file system

Add a Comment (Log in or register) (22 Comments)
  • prev
  • 1
  • next
Crash once every three years?????
by unixrules March 3, 2005 8:15 AM PST
He obviously has not been using Windows.
Reply to this comment
Yeah!
by NWLB March 3, 2005 8:22 AM PST
And god knows he doesn't have racks of MACs to deal with.

NWLB
*****
http://www.nwlbnet.blogspot.com
View reply
right
by David Arbogast March 3, 2005 8:58 AM PST
You are right, he's not using Windows. He's using Linux, and he claims to have systems crash every single day.
View reply
Crash once every three years?????
by unixrules March 3, 2005 8:15 AM PST
He obviously has not been using Windows.
Reply to this comment
Yeah!
by NWLB March 3, 2005 8:22 AM PST
And god knows he doesn't have racks of MACs to deal with.

NWLB
*****
http://www.nwlbnet.blogspot.com
View reply
right
by David Arbogast March 3, 2005 8:58 AM PST
You are right, he's not using Windows. He's using Linux, and he claims to have systems crash every single day.
View reply
Are Google's SERPs as Relevant as they were 2 Years Ago?
by aa March 5, 2005 6:57 PM PST
The Bottom line is relevancy. The labs add-ons are certainly very helpful. But - has Google's SERPs decreased in relevancy in the past two years.

It seems that between suddenly dropped sites and too much emphasis on link analysis, it is becoming more of an effort to get good results for some searches.
Reply to this comment
Are Google's SERPs as Relevant as they were 2 Years Ago?
by aa March 5, 2005 6:57 PM PST
The Bottom line is relevancy. The labs add-ons are certainly very helpful. But - has Google's SERPs decreased in relevancy in the past two years.

It seems that between suddenly dropped sites and too much emphasis on link analysis, it is becoming more of an effort to get good results for some searches.
Reply to this comment
Yahoo already did this.
by March 6, 2005 10:05 PM PST
I mean they're not as sophisticated as google, but Yahoo has pretty much done the same thing already for many years.
Reply to this comment
Yahoo already did this.
by March 6, 2005 10:05 PM PST
I mean they're not as sophisticated as google, but Yahoo has pretty much done the same thing already for many years.
Reply to this comment
Not all that new news...
by March 7, 2005 2:16 AM PST
One of there engeneers were talking about this 4 months ago... the video is here http://www.uwtv.org/programs/displayevent.asp?rid=2459
Reply to this comment
Not all that new news...
by March 7, 2005 2:16 AM PST
One of there engeneers were talking about this 4 months ago... the video is here http://www.uwtv.org/programs/displayevent.asp?rid=2459
Reply to this comment
(22 Comments)
  • prev
  • 1
  • next
advertisement

Latest tech news headlines

RSS Feeds

Add headlines from CNET News to your homepage or feedreader.

More feeds available in our RSS feed index.

Markets

Market news, charts, SEC filings, and more

Related quotes

Google (-0.56%) -3.47 619.40
  Symbol Lookup
advertisement

Inside CNET News

Scroll Left Scroll Right