• On TechRepublic: Five super-secret features in Windows 7
October 10, 2008 4:00 AM PDT

Academics sink teeth into Yahoo search service

by Stephen Shankland

SUNNYVALE, Calif.--It only took a few years for the science of information retrieval to move from an obscure academic niche to the secretive research departments at the heart of multibillion-dollar Internet companies.

But one of those companies, Yahoo, is trying to give a little more power back to the professors and grad students through a program called BOSS (Build Your Own Search Service). The service lets academics and start-ups build their own search sites around Yahoo's search engine for free, manipulating results however they want.

Two dozen researchers and students from Stanford, the Massachusetts Institute of Technology, Purdue, and other universities met here at Yahoo for a day in September to hear the company's BOSS pitch, show off some ideas they've had for how to use it, and try to coax Yahoo into sharing even more information through BOSS. Overall, their response to Yahoo's program was favorable.

MIT's Harr Chen

MIT's Harr Chen would love even more data from Yahoo.

(Credit: Stephen Shankland/CNET News)

"It enables a lot of research that we wouldn't otherwise be able to do," said Harr Chen, an MIT researcher at the event.

If it works out as hoped, Yahoo will make some money out of the program: corporate users who reach large scale with BOSS will have to show Yahoo's search ads. The academic side is a step removed from direct revenue, instead giving Yahoo some prominence with potentially influential thinkers in a market Google dominates. Piquing the interest of researchers at universities with a reputation for incubating the next big ideas is smart, though, and Yahoo and Google themselves both grew out of Stanford.

And honestly, with Google hogging 63 percent of the U.S. search market to Yahoo's 19.6 percent, what does Yahoo have to lose?

"We're not a market leader," said Prabhakar Raghavan, chief strategist for Yahoo Search. "From a strategic standpoint, it does make sense to let other people innovate on top of us. If the pie grows, our share of the pie grows at the expense of somebody else."

The ultimate hope is that BOSS will mean money, too.

Yahoo has made the investment in a massive infrastructure that constantly scans and re-indexes the Web, filters out some of the dreck, interprets search queries, and provides search results in high volume in very short order. This infrastructure is prohibitively expensive for start-ups, just as it is for academic researchers, so Yahoo is letting companies use BOSS as well. Those operating on a small scale may use BOSS for free, but Yahoo requires larger efforts to either show ads or sign a custom revenue-sharing deal.

Mashing up Yahoo results
One possibility for BOSS is that Yahoo's search results can be combined with other data sets. "Other parties may have more info about their users," said BOSS engineer Vik Singh. For example, a social-networking site can track movies or the activities of friends that could be useful in shaping search results. "This is stuff we may or may not have," Singh said.

Prabhakar Raghavan, chief strategist for Yahoo Search

Prabhakar Raghavan, chief strategist for Yahoo Search

(Credit: Stephen Shankland/CNET News)

Chengxiang Zhai and Bin Tan of the University of Illinois at Urbana-Champaign showed one example of BOSS in action that uses this idea of modifying Yahoo's search results. Their application steered Yahoo's search engine in particular directions based on the data stored on a user's own computer.

In the example, the computer was able to discern what type of jaguar the user was more likely to be looking for--the cat, not the car, or the version of Mac OS X--based on evidence on the computer.

"We believe the client side of personalization has a few advantages over the server side," Zhai said. "It can alleviate concern over privacy and it can provide more information about user activity. And it can naturally distribute computation," so a search company's machines share work with the user's own computer.

Qualitatively different
Researchers could investigate search and related technologies such as natural-language processing (NLP) without BOSS. But with it, that research is vaulted into a different domain. It isn't just a matter of taking more time; with BOSS's vast index of the Web, the possibilities are qualitatively different.

"You gain enormously from access to the data. There are all sorts of things you can do with tons of data" that you can't with a smaller set, said Stanford's Christopher Manning.

Manning works in the active field of natural-language processing, technology that aims to let computers discern the meaning of real human speech or text and that's behind search technology from search start-up Hakia and Microsoft-acquired PowerSet. NLP benefits tremendously from having large-scale data sources, Manning said.

"To understand what words mean, you look at how they're used. We do that on a large scale, (examining) usage and context to learn about meaning," Manning said.

Please, sir, I want some more
It also was clear the researchers' appetites were whetted by BOSS. Nobody sounded ungrateful, but heck, as long as Yahoo is sharing some important data, why not share a little more?

Yahoo is headed that direction. On the research day, it opened up access to another slice of search-related "prisma" data.

Vik Singh, an engineer behind Yahoo BOSS

Vik Singh, an engineer behind Yahoo BOSS

(Credit: Stephen Shankland/CNET News)

Prisma powers Yahoo's search assist feature that suggests searches based on what people have begun to type into the search box, which can make searching more convenient for users, but for researchers trying to build more technology atop Yahoo search results, prisma data is bigger than that. For example, it can show a search term's variations, its membership in categories such as place names, movies, and government, and the likelihood that people search for the term by itself or as part of a larger query.

"That's got a lot of potential," said Dan Ramage a natural-language processing Ph.D. candidate at Stanford. Ramage said BOSS is useful for his research, which focuses on determining the various relationships that can connect a pair of words, he said, but he'd like it better if he could get better control over the snippets of text Yahoo shows with its search results.

Yes, Yahoo will share more
Yahoo plans to release more. "Over time you'll see we'll offer a lot more ingredients, a lot more power," said Ashim Chhabra, senior product manager with the BOSS project.

Some researchers are hungry for as much as they can get. Chen, for example, hoped Yahoo could become an engine to run software supplied by researchers that plumbs its entire Web index.

"We give you a little code, you run that code on every document, then you give us a number," Chen suggested. It would be useful, for example, "to track evolution of themes and memes on the Web, different buzz trends."

Graham Mudd, product marketing manager for Yahoo search, said the idea is "not as crazy as you think," though he also gave the impression that researchers shouldn't hold their breath for that level of access. But Yahoo clearly wants to offer what he could.

When it comes to search research, "The pool of talent is divided between a half a dozen companies," Raghavan said. "We think it behooves us to open up."

Originally posted at Cutting Edge
Stephen Shankland writes about a wide range of technology and products, but has a particular focus on browsers and digital photography. He joined CNET News in 1998 and since then also has covered Google, Yahoo, servers, supercomputing, Linux and open-source software, and science. E-mail Stephen, or follow him on Twitter at http://www.twitter.com/stshank.
Recent posts from Webware
Smartphone users, keep complaining
Two new remote Webcams: Mole and Vue
Google launches Maps tool for finding flu vaccine
Get a $10 Restaurant.com gift certificate for 80 cents
Hundreds of Facebook groups hijacked
Plan your wedding with these Web resources
Twitter, LinkedIn team up for self-promotion free-for-all
'Elf Yourself' returns with Facebook and Twitter power
Add a Comment (Log in or register) (6 Comments)
  • prev
  • 1
  • next
by n3td3v October 10, 2008 11:21 AM PDT
Its actually BYOSS, not BOSS... yahoo is so retarded, its little things like this that make people not like yahoo anymore.
Reply to this comment
by Shankland October 10, 2008 12:40 PM PDT
Yes, watching people's tortured acronym creation process is always painful to watch.
by JetJaguar October 10, 2008 2:14 PM PDT
I'm glad to see Yahoo! focusing on its first love, search. I know originally it was a "directory, not a search engine", but for all intents and purposes, its about finding things. I think Yahoo! can compete with Google in the realm of search, and not worry so much about offering everything possible, with a snack bar on every other page.
Reply to this comment
by globalist_agenda October 11, 2008 1:44 PM PDT
Yahoo "spam' filter is brain dead. Why can't this company get the most basic things to work right? You would think that after I told the spam filter a dozen times already that "make $97 and hour working at home" is spam it would figure it out by now. I received at least two dozen "Bank of Burkina Faso" and "You have won the Irish lottery" emails. Sorry Yahoo, but it's 2008 not 1998.
Reply to this comment
by Imalittleteapot October 12, 2008 12:43 PM PDT
I always wanted to win the irish lottery.
by bridge solution October 12, 2008 5:25 PM PDT
the elegance of the yahoo paradigm is shown in the "jaguar" example. there is anecdotal evidence that google is getting that computation added by measuring time per page, server side: i.e, that if "jaguar" give me a page 1 on garbage about apple, tat has met htting next page as fast as possible, the next time i search "jaguar" the cat has a statisitical chance of showing up. i would state this as google logging a "link" between "me" and cat, regardless of what pages link to each other.
Ads served to me by yhoo tend to be more accurate than ones by goog.

and when yhoo serves me 300 emails about "3 inches".....i know someone whose machine connects to my address book has had their machine zombied, and yhoo knows better than to deny me contact from them, and waits for their isp to tell them they been zombied.
19% of 3 is bigger than 19% of 2.2...without stealing from anyone's market share. coaxing the world to 3.0 is, in my pov, truly clear thinking by yhoo...which in 1996 suggested to me after a 3 hour human guinea pig job of replying to its heuristic, i might want to subscribe to the journal of the danish girl scouts.
the google behemoth??..size is everything in nature, and that's why the dinosaurs rule the earth.... o... never mind.
Reply to this comment
(6 Comments)
  • prev
  • 1
  • next
advertisement

About Webware

Say No to boxed software! The future of applications is online delivery and access. Software is passé. Webware is the new way to get things done.

Add this feed to your online news reader

Webware topics

After 5 years, Firefox faces new challenges

Mozilla helped reshape the Web since releasing Firefox 1.0 five years ago. Now it's got a reawakened Microsoft and Google Chrome to reckon with.

There's a map for that: GPS or smartphone?

Almost every handset comes with mapping software these days, but standalone GPS devices are becoming more affordable than ever.

Inside CNET News

Scroll Left Scroll Right