• On GameFAQs: The top 10 fighting games of all time

News Blog

Read all 'data' posts in News Blog
July 1, 2008 11:32 AM PDT

Gnip to bridge the data divide for noisy Web services

by Josh Lowensohn
  • Post a comment

One of the key concerns for any fledgling start-up is overload. Too many users trying to get at your data is one thing, but dealing with the onslaught of notifications and data pings from connecting services can be quite another.

A new start-up called Gnip is trying to solve this problem by acting as the middleman. Got a service like Twitter that's getting attacked in a thousand different directions by services trying to get at that data? Sending any new bits of information to Gnip will keep that attack coming on their end instead of yours, which will hopefully keep your service running a lot smoother, no matter how many folks are using it. ReadWriteWeb is calling it a "Grand Central Station for the social Web."

In a perfect world, services that used this system could open up their APIs a little to encompass more activity, leading to faster third-party tools that take advantage of that data. Users would also be getting faster notifications and conceivably less downtime due to overload.

Sounds great for everyone, right?

Unfortunately, all of this will not be available from the get-go. Gnip is starting out by offering a notification service only, with polling, transformation, and identification coming later. Notifications are one of the main overloaders though, especially for services like Twitter that have had to throttle the amount of times any external service can ping it for data. There are also concerns about what happens if everyone starts relying on Gnip to pipe data to third-party tools, and the tool goes down--leading to something similar to when Amazon's S3 has had blips, taking out entire businesses for hours at a time.

Gnip was founded by Eric Marcoullier, one of the co-founders of the now Yahoo-owned MyBlogLog.

Gnip bridges the data divide by offloading all the pings off your servers and onto theirs.

(Credit: Gnip)
Originally posted at Webware
June 27, 2008 3:27 PM PDT

Google data-sharing gets authentication option

by Stephen Shankland
  • Post a comment

Google now supports the open OAuth standard for sharing data through its Google Data interface, a move that could make it easier to tap into information stored at Google property.

Google headquarters in Mountain View, Calif.

Google headquarters in Mountain View, Calif.

(Credit: Stephen Shankland/CNET News.com)

The Google Data API (application programming interface)--GData for short--provides a conduit whereby other Web sites can slurp out data stored at Google. For personal information, such as photos at Picasa or contacts at Gmail, access to that information requires authentication. OAuth provides a standard way to perform that authentication, which means programmers at least theoretically should have an easier time writing code.

Google announced the OAuth support Thursday on its Data API blog.

Also Thursday, Google announced that Google Finance is now supported in the Google Data API. That means data could be retrieved to build, for example, a gadget with a live chart showing changing portfolio value.

And since the API permits two-way communications, it also means an outside service could update a user's information at Google Finance, for example with recent stock trades.

Originally posted at Webware
June 19, 2008 8:53 AM PDT

Data Loss Prevention needs a new name--and acronym

by Jon Oltsik
  • 5 comments

We are an industry of Three Letter Acronyms (TLAs). Everyone tries to categorize what they do with them.

Some like ERP stick around for years, while others like Enterprise Optical Networking (EON) come and go without much fanfare. On occasion, however, the industry creates a TLA to define an industry trend, but as the market and technology develop the TLA no longer fits.

This explanation aptly describes the situation with Data Loss Prevention (DLP). A few years ago, DLP vendors like Vericept and Vontu made hay by providing a network-based gateway appliance that would scan IP packets looking for confidential data "leakage." When evil Joe in accounting tried to send a spreadsheet of customer credit card numbers to his Hotmail account, DLP boxes could detect and prevent this type of malicious behavior.

Given this heritage, the DLP acronym was appropriate circa 2005, but not in 2008. Why? Gateway DLP packet filtering devices are only part of the story; today's DLP vendors do a heck of a lot more. Tablus is an expert at data discovery. Vericept excels in data classification. Orchestria is really good at policy management and enforcement. As part of Symantec, Vontu is focusing on integrating DLP functionality with other IT operations tasks. Finally, some vendors like Trend Micro and McAfee eschew the network altogether and focus on endpoints.

So if DLP doesn't fit anymore, what does? My colleague Charlotte Dunlap and I suggest we borrow another acronym and re-name this category Data Governance, Risk, and Compliance (DGRC). To us, this covers everything that's needed in the data lifecycle data including creation, classification, and policy management/enforcement. Typically, only Gartner acronyms stick, but Charlotte and I have our fingers crossed.

In all seriousness, many large organizations have no idea how much confidential and private data they have or where it is stored--a pretty scary thought. Given this problem, gateway filtering devices aren't enough. We need DGRC policies, processes, and technologies across all data around the enterprise. We need a new acronym that aptly describes this situation, even if it's actually four letters.

Jon Oltsik is a senior analyst at the Enterprise Strategy Group.
June 11, 2008 6:42 AM PDT

Red Hat settles patent suit with Firestar, DataTern

by Dawn Kawamoto
  • 5 comments

Red Hat announced on Wednesday that it has reached a settlement with Firestar Software and DataTern over a patent infringement lawsuit.

The lawsuit, filed two years ago in a U.S. District Court in Texas, centered on Firestar's patent for linking object-oriented software with relational databases.

Firestar, in its lawsuit, had alleged that JBoss, which Red Hat had acquired, violated its patent with the JBoss Hibernate 3.0 object-relational mapping tool for Java. Hibernate 3.0 had an open license.

Under the settlement, whose financial terms were not disclosed, all software distributed under Red Hat's brands and predecessor versions are covered, as well as Red Hat customers that use the software. The software protects derivative works, or combination products, that use covered products from the patent claim.

"Typically, when a company settles a patent lawsuit, it focuses on getting safety for itself," Rob Tiller, Red Hat's assistant general counsel of intellectual property, said in a statement. "But that was not enough for us; we wanted broad provisions that covered our customers."

DataTern became involved in the lawsuit after Firestar assigned the patent to DataTern.

May 30, 2008 4:00 AM PDT

Google spotlights data center inner workings

by Stephen Shankland
  • 24 comments

SAN FRANCISCO--The inner workings of Google just became a little less secret.

The search colossus has shed only occasional light on its data center operations, but on Wednesday, Google fellow Jeff Dean turned a spotlight on some parts of the operation. Speaking to an overflowing crowd at the Google I/O conference here on Wednesday, Dean managed simultaneously to demystify Google a little while also showing just how exotic the company's infrastructure really is.

Google fellow Jeff Dean

Google fellow Jeff Dean

(Credit: Stephen Shankland/CNET News.com)

On the one hand, Google uses more-or-less ordinary servers. Processors, hard drives, memory--you know the drill.

On the other hand, Dean seemingly thinks clusters of 1,800 servers are pretty routine, if not exactly ho-hum. And the software company runs on top of that hardware, enabling a sub-half-second response to an ordinary Google search query that involves 700 to 1,000 servers, is another matter altogether.

Google doesn't reveal exactly how many servers it has, but I'd estimate it's easily in the hundreds of thousands. It puts 40 servers in each rack, Dean said, and by one reckoning, Google has 36 data centers across the globe. With 150 racks per data center, that would mean Google has more than 200,000 servers, and I'd guess it's far beyond that and growing every day.

Regardless of the true numbers, it's fascinating what Google has accomplished, in part by largely ignoring much of the conventional computing industry. Where even massive data centers such as the New York Stock Exchange or airline reservation systems use a lot of mainstream servers and software, Google largely builds its own technology.

I'm sure a number of server companies are sour about it, but Google clearly believes its technological destiny is best left in its own hands. Co-founder Larry Page encourages a "healthy disrespect for the impossible" at Google, according to Marissa Mayer, vice president of search products and user experience, in a speech Thursday.

To operate on Google's scale requires the company to treat each machine as expendable. Server makers pride themselves on their high-end machines' ability to withstand failures, but Google prefers to invest its money in fault-tolerant software.

"Our view is it's better to have twice as much hardware that's not as reliable than half as much that's more reliable," Dean said. "You have to provide reliability on a software level. If you're running 10,000 machines, something is going to die every day."

Breaking in is hard to do
Bringing a new cluster online shows just how fallible hardware is, Dean said.

In each cluster's first year, it's typical that 1,000 individual machine failures will occur; thousands of hard drive failures will occur; one power distribution unit will fail, bringing down 500 to 1,000 machines for about 6 hours; 20 racks will fail, each time causing 40 to 80 machines to vanish from the network; 5 racks will "go wonky," with half their network packets missing in action; and the cluster will have to be rewired once, affecting 5 percent of the machines at any given moment over a 2-day span, Dean said. And there's about a 50 percent chance that the cluster will overheat, taking down most of the servers in less than 5 minutes and taking 1 to 2 days to recover.

A look at a custom-made Google rack with 40 servers from a modern data center. Infrastructure guru Jeff Dean showed the snapshot at the Google I/O conference.

A look at a custom-made Google rack with 40 servers from a modern data center. Infrastructure guru Jeff Dean showed the snapshot at the Google I/O conference.

(Credit: Stephen Shankland-CNET News.com/Jeff Dean-Google)

While Google uses ordinary hardware components for its servers, it doesn't use conventional packaging. . And, Dean said, the company currently puts a case around each 40-server rack, an in-house design, rather than using the conventional case around each server.

The company has a small number of server configurations, some with a lot of hard drives and some with few, Dean said. And there are some differences at the larger scale, too: "We have heterogeneity across different data centers but not within data centers," he said.

As to the servers themselves, Google likes multicore chips, those with many processing engines on each slice of silicon. Many software companies, accustomed to better performance from ever-faster chip clock speeds, are struggling to adapt to the multicore approach, but it suits Google just fine. The company already had to adapt its technology to an architecture that spanned thousands of computers, so they already have made the jump to parallelism.

"We really, really like multicore machines," Dean said. "To us, multicore machines look like lots of little machines with really good interconnects. They're relatively easy for us to use."

Although Google requires a fast response for search and other services, its parallelism can produce that even if a single sequence of instructions, called a thread, is relatively slow. That's music to the ears of processor designers focusing on multicore and multithreaded models.

"Single-thread performance doesn't matter to us really at all," Dean said. "We have lots of parallelizable problems."

The secret sauce
So how does Google get around all these earthly hardware concerns? With software--and this is where you might think about dusting off your computer science degree.

A Google data center, circa 2000. Note the fan on the floor to cool servers.

A Google data center, circa 2000. Note the fan on the floor to cool servers.

(Credit: Stephen Shankland-CNET News.com/Jeff Dean-Google)

Dean described three core elements of Google's software: GFS, the Google File System, BigTable, and the MapReduce algorithm. And although Google helps with a lot of open-source software projects that helped the company get its start, these packages remain proprietary except in general terms.

GFS, at the lowest level of the three, stores data across many servers and runs on almost all machines, Dean said. Some incarnations of GFS are file systems "many petabytes in size"--a petabyte being a million gigabytes. There are more than 200 clusters running GFS, and many of these clusters consist of thousands of machines.

GFS stores each chunk of data, typically 64MB in size, on at least three machines called chunkservers; master servers are responsible for backing up data to a new area if a chunkserver failure occurs. "Machine failures are handled entirely by the GFS system, at least at the storage level," Dean said.

To provide some structure to all that data, Google uses BigTable. Commercial databases from companies such as Oracle and IBM don't cut the mustard here. For one thing, they don't operate the scale Google demands, and if they did, they'd be too expensive, Dean said.

BigTable, which Google began designing in 2004, is used in more than 70 Google projects, including Google Maps, Google Earth, Blogger, Google Print, Orkut, and the core search index. The largest BigTable instance manages about 6 petabytes of data spread across thousands of machines, Dean said.

MapReduce, the first version of which Google wrote in 2003, gives the company a way to actually make something useful of its data. For example, MapReduce can find how many times a particular word appears in Google's search index; a list of the Web pages on which a word appears; and the list of all Web sites that link to a particular Web site.

With MapReduce, Google can build an index that shows which Web pages all have the terms "new," "york," and "restaurants"--relatively quickly. "You need to be able to run across thousands of machines in order for it to complete in a reasonable amount of time," Dean said.

The MapReduce software is increasing use within Google. It ran 29,000 jobs in August 2004 and 2.2 million in September 2007. Over that period, the average time to complete a job has dropped from 634 seconds to 395 seconds, while the output of MapReduce tasks has risen from 193 terabytes to 14,018 terabytes, Dean said.

On any given day, Google runs about 100,000 MapReduce jobs; each occupies about 400 servers and takes about 5 to 10 minutes to finish, Dean said.

That's a basis for some interesting math. Assuming the servers do nothing but MapReduce, that each server works on only one job at a time, and that they work around the clock, that means MapReduce occupies about 139,000 servers if the jobs take 5 minutes each. For 7.5-minute jobs, the number increases to 208,000 servers; if the jobs take 10 minutes, it's 278,000 servers.

My calculations could be off base, but even qualitatively, that's enough computing horsepower to make the mind boggle.

Fault-tolerant software
MapReduce, like GFS, is explicitly designed to sidestep server problems.

"When a machine fails, the master knows what task that machine was assigned and will direct the other machines to take up the map task," Dean said. "You can end up losing 100 map tasks, but can have 100 machines pick up those tasks."

The MapReduce reliability was severely tested once during a maintenance operation on one cluster with 1,800 servers. Workers unplugged groups of 80 machines at a time, during which the other 1,720 machines would pick up the slack. "It ran a little slowly, but it all completed," Dean said.

And in a 2004 presentation, Dean said, one system withstood a failure of 1,600 servers in a 1,800-unit cluster.

Next-generation data center to-do list
So all is going swimmingly at Google, right? Perhaps, but the company isn't satisfied and has a long to-do list.

Most companies are trying to figure out how to move jobs gracefully from one server to another, but Google is a few orders of magnitude above that challenge. It wants to be able to move jobs from one data center to another--automatically, at that.

"We want our next-generation infrastructure to be a system that runs across a large fraction of our machines rather than separate instances," Dean said.

Right now some massive file systems have different names--GFS/Oregon and GFS/Atlanta, for example--but they're meant to be copies of each other. "We want a single namespace," he said.

These are tough challenges indeed considering Google's scale. No doubt many smaller companies look enviously upon them.

May 12, 2008 12:48 PM PDT

HP in talks to buy EDS

by Erica Ogg
  • 7 comments

Updated at 2:20 p.m. PST.

Hewlett-Packard is in talks to buy Electronic Data Systems, HP confirmed Monday.

The Wall Street Journal initially reported the two have been in talks for HP to buy EDS for $12 billion to $13 billion, citing unnamed sources. An agreement between the world's largest computer maker and the IT services provider could come as early as Tuesday, according to the Journal.

HP-EDS deal by the numbers
Credit: Susan Dove/CNET News.com

Shares of HP were down 6 percent after the story posted, and HP confirmed that trading of its stock has been halted. EDS shares were up 27 percent on the news.

HP, which is due to report its second quarter earnings on Thursday, issued a press release after the close of the stock market Monday.

"There can be no assurances that an agreement will be reached or that a transaction will be consummated. HP does not intend to comment further until an agreement is reached or discussions are terminated," the statement reads.

EDS also issued a statement Monday afternoon confirming that the two are in "advanced talks," and but refused to elaborate.

EDS' revenue in 2007 was $22.1 billion, up 4 percent from the year before. HP's 2007 revenue was $104 billion.

If the acquisition should go through, HP would have a stronger competitive hand against IBM as they compete for business customers. Thanks to IBM's Global Services arm, Big Blue can offer back-end hardware such as servers along with longterm service contracts. But there's one difference: HP still has its PC business, and has spent the last year as the top seller of PCs in the world. IBM, on the other hand, sold its PC arm to Lenovo four years ago.

"I think HP has been wanting in some sense to be more like IBM for quite a while," said Gordon Haff, analyst with Illuminata. "It's perfectly consistent with what HP has been trying to do to become more of a solutions provider rather than (just) a product or technology provider. Whether this is going to make sense is going to turn very much on what kind of price HP can get."

This also isn't the first time HP has taken a crack at acquiring a major consulting firm. In 2000, HP was in talks to acquire PriceWaterhouse Cooper. The controversial acquisition was the first big move by then-CEO Carly Fiorina. But a significant earnings shortfall in the fall of 2000, along with significant handwringing on Wall Street, prompted HP to drop the idea. IBM acquired PWC for $3.5 billion two years later, while HP took a dramatically different strategy and acquired PC maker Compaq. Eight years later, the EDS acquisition would seem to bring the two companies back to the same point, albeit as much larger companies.

In recent years, HP has also been spending big on corporate infrastructure software companies, including the acquisitions of Mercury Interactive, Opsware, SPI Dynamics, Bristol Technology, and Peregrine. Combining those pieces of corporate software with a large consulting arm would be a head-on attack on IBM Global Services' ability to sell consulting services around packages such as the Tivoli management software.

The initial take from industry observers is that this deal is Carly 2.0.

"It's somewhat amusing because we've seen this play before. I think this is sort of further evidence that HP really does see value at scale basically, at size," said Haff. "One of the things we've seen very clearly over the last couple years that is Carly really had the right idea, she just couldn't execute on it. She wasn't wrong for saying HP needed to be bigger, effectively," said Haff. "If (the merger) does go through we're going to end up with an HP that looks a lot like Carly wanted it to look."

The difference, he added, is that it looked like Fiorina couldn't operate a company that large, whereas current CEO Mark Hurd appears able.

The task of integrating two large companies and their vastly different technology and corporate cultures is an unenviable one. The upside is that HP would have more to offer to companies with large IT infrastructure needs, and EDS would be able to broaden its reach. But is bigger necessarily better?

"Not all customers need a battleship to deliver their IT services," Forrester analyst Paul Roehrig points out. "Some customers are looking for smaller, more flexible, transparent service providers. In a sense, I'm wondering if HP is trading off success in the smaller deal for larger deals."

Plus, he added, in the past, "Hurd has said they've not been going after (large deals). The question is, is this a strategic inconsistency or all part of a master plan?"

CNET News.com's Jim Kerstetter contributed to this story.

April 30, 2008 6:55 PM PDT

IBM aims to lighten the (energy) load at data centers

by Steven Musil
  • Post a comment

The data centers used by tech companies to run their Web sites and corporate networks are notorious energy hogs.

The information and communications technology sector currently accounts for about 6 percent of the nation's power consumption, up from about 2 percent to 3 percent in 2000, according to a report in February from the American Council for an Energy-Efficient Economy.

In a report to Congress last August, the Environmental Protection Agency predicted that the amount of power used by U.S. data centers would more than double over the next five years, at a cost of $7.4 billion each year. The EPA also suggested that the nation could save up to $4 billion in energy costs, if it made its data centers more energy-efficient.

Those figures have led many tech giants, such as Microsoft, Google, IBM, and Dell, to get behind efforts to reduce power consumption in data centers. Now IBM is ramping up its business of selling power-saving technologies with new tools designed to track and cap data center energy consumption, including power for air conditioning to cool server computers, according to a report from Reuters. The products were announced at an IBM conference Wednesday in Los Angeles.

IBM is also expanding to 27 countries a program begun last year as part of its Big Green Innovations that lets companies earn and trade certificates awarded for verified energy savings, Reuters reported.

"Energy efficiency has become a critical business metric, like product reliability and customer satisfaction," William Zeitler, head of IBM's systems and technology group, told Reuters. "This is a critically important problem in the industry."

Certainly, Big Blue is landing a lot of the Big Green by helping other companies go green.

The initiative has generated nearly $200 million of technology services contract signings in the first quarter and about $300 million in the fourth, Reuters quoted Chief Financial Officer Mark Loughridge as saying during recent earnings presentations.

Originally posted at Green Tech
April 16, 2008 8:28 PM PDT

Facebook expands Mini-Feed to include Digg

by Harrison Hoffman
  • 1 comment

Importing Digg stories on Facebook

(Credit: Facebook)

Adding to Tuesday's release, Facebook has added an additional service for Mini-Feed importing, Digg.

This is a big win for Digg. Over the last six months, I have seen a significant increase in the usage of Digg by college students, and this inclusion in the Facebook Mini-Feed will only improve its reach in that demographic.

Of course, Facebook has expanded greatly beyond its initial college market, and the inclusion of Digg may alert a lot of users to the service for the first time.

A concern that I have with the integration is that your Mini-Feed will probably become really cluttered with Digg stories, if you are a heavy digger. Digg does, however, have a Facebook application that keeps your "dugg" stories neatly in a module on your profile page.

At this point, I haven't decided whether I like the application or the Mini-Feed approach more, but I do think that it is great that Facebook is integrating these third-party sites and turning users on to more Web 2.0 services.

Originally posted at The Web Services Report
Harrison Hoffman is a tech enthusiast and co-founder of LiveSide.net, a blog about Windows Live. He is a member of the CNET Blog Network, and is not an employee of CNET. Disclosure.
April 11, 2008 3:23 PM PDT

Use Google Maps to find Google data centers

by Stephen Shankland
  • 1 comment

A map view of Google's European data centers.

A map view of Google's European data centers.

(Credit: Pingdom)

Ever wonder where Google's data centers are? Now you can use Google Maps to get a good overview.

Pingdom put together a Google map via Wayfaring based on data center location information from Data Center Knowledge. The map shows general areas, so don't expect to zoom in on a satellite photo.

The result is an artful scattering of 36 pushpins, one for each known data center. Yes, that's a lot of data centers, and Data Center Knowledge has some details on Google renting further capacity, too.

And there are likely more tucked away here and there. "Since Google tends to be quite secretive about their data centers in general, the information we have presented here most likely isn't 100 percent complete," Pingdom said.

April 8, 2008 3:17 PM PDT

Cisco spins in data center start-up

by Marguerite Reardon
  • Post a comment

Cisco Systems said Monday that it's added to its arsenal of data center technology with a new switch and the purchase of start-up Nuova Systems.

Cisco, which already owned 80 percent of Nuova, worked with the start-up to build the new Nexus 5000. On Monday, Cisco introduced the new switch, and also announced that it has bought the remaining 20 percent of the start-up.

Cisco announced its $70 million investment in Nuova in 2006. The company didn't disclose details of the current buyout. But in April of last year, it expanded its funding agreement and raised the maximum potential payout of the transaction to $678 million.

Cisco will continue to use the Nuova technology to further develop and expand the Nexus product line, a new line of data center switches the company introduced earlier this year. The first product in the line was the Nexus 7000. The Nexus 5000 will provide a smaller, fixed version of the product.

The new Nexus switches combine Ethernet switching, IP routing, storage, and security into a single device. And while these switches don't necessarily take over the functions of servers or storage area devices in the data center, they will allow companies to use their servers and storage devices more efficiently.

Cisco has spent more than two decades building its brand as a switching and routing powerhouse. Now, it is tackling the data center, where it hopes its Nexus products will dominate. These new switches are expected to become the next high-ticket item Cisco can sell to large companies to help fuel the company's growth. Cisco hopes the data center will be worth about $10 billion over the next five years, which means it is a key opportunity for a company that needs to grow at least between 10 percent and 15 percent a year to satisfy Wall Street.

But Cisco's move into the data center could pit it against some of its largest partners, such as IBM, Hewlett-Packard, and EMC. Cisco claims its products complement products from these companies. IBM and Hewlett-Packard are more server-centric. And EMC is more focused on the issue from a storage angle. Meanwhile, Cisco sees the intelligent network, with its Nexus 7000 sitting in the center, as the best answer for virtualizing the data center.

Still, competing with its partners is starting to become a familiar tune at Cisco, which has found itself also competing head-to-head with software giant Microsoft in some areas of its business.

S.F. hacker space: Heaven for the DIY set?

The Noisebridge hacker space offers sewing and Mandarin classes, soldering workshops, Internet-controlled front door access, and a server room with no door.
• Photos: Circuits, code, community

The browser battles go on and on

roundup From Firefox to IE and from Chrome to Opera and Safari, there's no sitting still for browser makers looking to keep their products fresh and competitive.

About News Blog

Recent posts on technology, trends, and more.

Add this feed to your online news reader



advertisement

Inside CNET News

Scroll Left Scroll Right