- Related Stories
-
Eclipse shines light on future projects
March 1, 2005
(continued from previous page)
can locate copies of a piece of data, such as a keyword index, if the original is out of commission.
"You make the software tolerate failures. If you can expect failures, then this is what makes cheap commodity PCs viable for Internet services," Hoelzle said.
Google's PC servers, which number in the thousands, run a stripped-down version of Linux, which is based on the Red Hat distribution but is really just the operating system kernel modified for Google, he added.

Urs Hoelzle
VP of operations
and of engineering,
The company has also devised a system for handling massive amounts of data and returning rapid responses to queries. Google splits the Web into millions of pieces, or "shards" in Google tech speak, which are replicated in case of failure.
Not surprisingly, the company creates an index of words that appear on the Web, which it stores as an array of large files. But it also has document servers, which hold copies of Web pages that Google crawls and downloads.
Another important engineering feat done by Google is to make writing programs that run across thousands of servers very straightforward, according to Hoelzle. Normally, building applications to run in a "parallel" configuration of servers requires specialized tools and skills.
Google's programming tool, called MapReduce, which automates the task of recovering a program in case of a failure, is critical to keeping the company's costs down.
"Cost is really the sum of what the equipment you need to do the work costs and how much programming time you need to put into getting something useful," Hoelzle said, adding that Google has started using MapReduce more widely over the past year.
Finally, Google has created "batch" job scheduling software that acts as a sort of taskmaster for millions of operations. Called the Global Work Queue, it breaks up computing jobs into many smaller tasks and distributes them across machines.
For all its built-in redundancy in case of failure, the system doesn't address all problems, Hoelzle revealed. During the presentation, he showed a photo of six fire trucks responding to an emergency at a Google data center in an undisclosed location.
He would not reveal any specific details on the mishap except to say that "it wasn't about one machine going down."In a follow-up interview with CNET News.com, Hoelzle said the cost of power is another important factor in Google's data center designs.
"The physical cost of operations, excluding people, is directly proportional to power costs," he said. "(Power) becomes a factor in running cheaper operations in a data center. It's not just buying cheaper components but you also have to have an operating expense that makes sense."
See more CNET content tagged:
EclipseCon, failure, Google Inc., data center, file system






NWLB
*****
http://www.nwlbnet.blogspot.com
NWLB
*****
http://www.nwlbnet.blogspot.com
It seems that between suddenly dropped sites and too much emphasis on link analysis, it is becoming more of an effort to get good results for some searches.
It seems that between suddenly dropped sites and too much emphasis on link analysis, it is becoming more of an effort to get good results for some searches.
- Not all that new news...
- by March 7, 2005 2:16 AM PST
- One of there engeneers were talking about this 4 months ago... the video is here http://www.uwtv.org/programs/displayevent.asp?rid=2459
- Like this Reply to this comment
-
(22 Comments)