Yahoo is ready to integrate real-time results from Twitter directly onto its search pages.
(Credit: Yahoo)Not to be outdone by its rivals, Yahoo is getting into the real-time search business as well.
Days after Google announced its plan for integrating content from sources such as Twitter and blogs, Yahoo on Thursday plans to launch its own feature to integrate tweets into search results. Microsoft already displays Twitter results for queries placed on its Bing search engine, although they displayed on a separate page that is not directly integrated into the main search results.
Yahoo will join Google witih integrated results as of Thursday, said Larry Cornett, vice president of product management and design at the company. But in a crucial difference between the two approaches, Yahoo has not cut a deal with Twitter for access to the "firehose," an automated feed of data from Twitter. Instead, it's using Twitter's public API and adding its own algorithms to figure out which tweets are most relevant to the query.
The thorniest problem with real-time search is relevancy. So much content is created every second on the Internet--from tweets to status updates to new blogs to new news stories such as this one--that it's a challenge to simply capture that data, let alone decide which sources of data are more relevant and authoritative than others.
Yet there's clear demand for answers to the question, "What is happening right this second?" And search engines are presumably in the best position to deliver those answers, but unless they are able to find a way to harness the flood of real-time information and make sense of it, these services are unlikely to be very useful.
For hot topics, such as Obama or Tiger Woods, Yahoo plans to use the Twitter tab it added to the News Shortcut feature already found in Yahoo search results. For other topics that are gaining traction but don't necessarily have a huge amount of news, photos or videos already associated with that query, Yahoo will surface three tweets related to the topic and chosen by its algorithms, Cornett said.
The main problem with Yahoo's approach is that it's not exactly real-time: the most recent results surfaced during a demonstration were 15 to 20 minutes old, and the user must manually refresh the page to get new results. Google's approach not only refreshes automatically due to its use of Twitter's firehose feed, but it also brings in content from sources other than Twitter.
The other major problem for Yahoo, of course, is that its search share is dropping, something Yahoo CEO Carol Bartz blamed on expiring toolbar deals during an investor conference Tuesday. While Yahoo says it is committed to remaining a player in the search market by coming up with new ideas for search presentation, this week shows just how easy it is for Google to take a similar idea (real-time search) and put out a similar-if-not-better take on the same idea.
Before too long, expect to find anything that anyone puts on the Internet on Google within seconds: with luck, it might even be useful.
Real-time search has come to Google. The company has been hinting at this day for several months, most recently when it announced a deal to access Twitter's "firehose" of data. But it presented its vision for real-time search before the media Monday at the Computer History Museum, claiming to have made a little history on its own.
Over the next few days, Google users will start to notice a box called "Latest results" on the main search results page for a topic that's guaranteed to produce results. Google used "Obama" as its example, and searches for that query place a new box that automatically scrolls through recent "real-time" results associated with that topic from sources like Twitter, FriendFeed, and Google News, as well as new Web pages--such as this story--as they are created.
The concept is hot in the search world: Microsoft's Bing also displays updates from Twitter and various blogs, although those results are not integrated with the main page. And Yahoo has also signed up with a company called OneRiot to throw its hat into the real-time search wars.
What's less clear, however, is how useful this technology will be unless Google and others working on the problem can bring the same degree of relevance and trust to real-time results that it brings to regular search results. Google News can already confuse the casual user who wonders how and why those particular headlines were singled out, so how will relevancy work when a stream of news can knock a particularly authoritative result off your screen in seconds?
"It's a very hard problem. Language understanding is still an unsolved problem," said Amit Singhal, a Google Fellow and one of the key players in developing this product. "Not only do we have to understand what someone is saying, but we have to get to the deeper semantics of what is indeed true. We have to work through many issues. Truth ends up being a rather vague notion."
In a way, this challenge is right up Google's alley. The company is obsessed with speed when it comes to presenting results, agonizing over whether design changes that add tenths of seconds to page-loading times are worth the effort.
And now that seemingly everyone has a blog, a microblog, a social-networking profile, and commenting identity (or 29), new content on the Internet is being generated at an astounding pace. Google used to think it would be able to index all the world's information in about 300 years, but CEO Eric Schmidt told CNET in November that one of Google's greatest challenges in the decades ahead will be staying abreast of the explosion in content enabled by social media.
That's why it's a bit surprising that Google, the world's leading search engine by a wide margin, hasn't necessarily been a leader in this area. Marissa Mayer, vice president of search and user experience at Google, admitted Monday the company could have moved more quickly to organize the vast amount of data produced by services such as Twitter. Anyone who has tried to use Twitter Search knows that real-time search at the moment is like the regular Internet was 10 years ago: a blast of information that's impressive in its scope but overwhelming in its usefulness.
But what Google is trying to do is leapfrog the notion of Twitter as the vanguard of the real-time content explosion. Twitter is undeniably hot at the moment, but new Web pages are generated constantly, especially as traditional media companies move online. One need only to think back to this summer when news reports of Michael Jackson's death sent millions online looking for confirmation, staggering services such as Google and Twitter under that load.
What will Google's real-time search look like the next time somebody famous dies?
(Credit: Google)Google said it plans to display all kinds of Internet content in its "Latest news" box. Google didn't pay Twitter an undisclosed amount of money for access to its feed for no reason, however; the speed at which real-time content is generated can be harnessed much easier if search providers such as Google have that information pushed to them, rather than having to pull it out of the Web itself.
That raises the question of just how Google will index and rank real-time results. The company needs to develop the real-time equivalent of PageRank, which evaluates Web pages by the number of other pages that are linking to that page. That's something Google "is beginning to experiment with," Mayer said in a question-and-answer session following Google's presentation.
There's definitely some way to do that, but it certainly is not a simple problem. Someone with 15,000 Twitter followers is not necessarily as authoritative in one area as they are in another, and Google will have to figure out some way to evaluate this information to make it truly useful.
Until then, however, news junkies can entertain themselves watching the Latest results section spin with updates on Tiger Woods' latest paramour or the glacial progress of Congress' attempt to pass health-care reform legislation.
In a roughly 10-second period Monday afternoon on Google's Trends page, where it is testing out the real-time service, the feed for "Pearl Harbor Day"--the second most popular trend on the Internet Monday behind the aforementioned Tiger Woods--produced a tweet about a Pearl Harbor Day poem, a news story on people who were in Pearl Harbor on December 7, 1941, and a gentleman celebrating Ruby Diner's 27th anniversary with a $2.70 Rubyburger. (He also happened to note in his tweet that it was Pearl Harbor Day.)
Google's new real-time search interface automatically updates search results for hot topics like Tiger Woods, without requiring a browser refresh.
(Credit: Screenshot by Tom Krazit/CNET)Google announced Monday the fruits of its earlier deal with Twitter, showing off how it has decided to present real-time Internet content within search results.
Amit Singhal, Google fellow, introduced the real-time section during an event at the Computer History Museum in Mountain View, Calif. "We are here today to announce Google real-time search," Singhal said, calling it "Google relevance technology meets the real-time Web."
Google fellow Amit Singhal explains Google's strategy on how to present real-time search results.
(Credit: Stephen Shankland/CNET)Twitter search will show the latest matches for a particular search term, but Google wants to do more than sort results by time. "Relevance is the foundation of this product," Singhal said. "It's relevance, relevance, relevance."
Google will build a section called "latest results" into the regular Google search results page that automatically refreshes Internet content from sources like Twitter. Singhal showed off how a search for "Obama" would bring up tweets, Web pages, and other Internet content related to the president as it was generated. At the Web 2.0 conference in October, Google struck a deal with Twitter to get access to the service's "firehose" of tweets.
Updated 11:13 a.m. PST: Google plans to roll this out over the next several days, and not all users may see the new section immediately, Singhal said. The company also announced partnerships with social-networking companies Facebook and MySpace to display updates from those services.
Updated 11:22 a.m. PST: Real-time search at Google involves more than just social-networking and microblogging services. While Google will get information pushed to it through deals with those companies, it also has improved its crawlers to index and display virtually any Web page as it is generated. Facebook updates posted to public Facebook pages will be indexed, while any MySpace update designated as public will appear in search results.
Updated 11:30 a.m. PST: Google also demonstrated a Google Labs project called "Google Goggles," which allows a smartphone user to take a picture of a given object and send it to Google in hopes of finding out more information about that object. Up until the real-time announcement, mobile search was ruling the day, as Google's Vic Gundotra demonstrated Google Goggles, a new Android application that can show locations of interest surrounding a GPS position, and the ability for Japanese speakers to now use Google's voice search features.
Updated 12:42 p.m. PST: Marissa Mayer, Google's vice president of search and user experience, said real-time search took Google somewhat by surprise. "I wish we'd had the foresight to see this," she said.
Marissa Mayer, Google's vice president of search and user experience, speaks at a Google search event Monday.
(Credit: Stephen Shankland/CNET)Indeed, many people position Twitter, not Google, as central to the process of finding out what's going on right now.
There's a challenging balance between assessing what's new about a subject and what's correct, though, but Google believes the real-time search results will actually lead people to the truth faster, Singhal said. How do you assess the latest rumor when it can take time for the truth to emerge?
"Right now a straightforward answer is we emphasize quality and relevance. That often brings the truth out," Singhal said.
And when Google is deciding whether to include your own online musing, you're not just as good as your latest tweet. Just as it uses PageRank and other mechanisms to establish authority of a Web page for search, Google will apply its own measurements to those whose updates appear in real-time results.
Retweets and the number of followers a person has factor into Google's assessment of quality, he said.
Updated 2:02 p.m. PST: The real-time search features is computationally difficult, and Google had to develop more than a dozen technologies to get it working, Singhal said. Not only must it constantly monitor innumerable accounts for the latest updates, it must assess their quality and their relevance to particular queries.
Those who don't yet see the service can get to a version of it using the Google Trends site, which just emerged from beta testing. The "hot topics" area that shows items of high search interest at the moment, and clicking on one of the results shows search results with the scrolling real-time feed of information.
It's all part of getting people what they want, whether they know they want it or not. Mayer shared an example of a person buying a baby stroller.
"If you bought a product, you'd feel really foolish not knowing there was a recall," Mayer said.
And that challenge these days increasingly is a real-time phenomenon.
"In the early days of Google, we used to crawl (the Web for) information every month, then put up new index," a process called the Google dance, Singhal said. "A month was not fast enough. Then we were crawling the Web every few days, then every day, then every few hours. Now we can crawl every few minutes."
"In today's world that's not fast enough," Singhal said. "In this information environment, seconds matter."
New start-up Factery Labs is launching its first service on Tuesday, a technology called FactRank that can tear through Web pages and collect what it calls "facts." These are bits of information from each source page that Factery Labs' algorithm then organizes into an order of importance.
What this means for you is that developers will soon make use of the technology in third-party search engines or on Web pages to very quickly deliver reading summaries. This cuts out most (or all) of the parts you don't care about, while organizing the bits you might. It also manages to do all this in real time.
The FactRank technology was created by Paul Pedersen, who has a good background in search, including gigs at Inktomi, Google, and Powerset. CNET News met with him and co-founder Sean Gaddis (former Skype and eBay'er) on Monday to get a demo of how the technology works.
In a nutshell it goes like this: FactRank goes through each Web page or source (in whatever index it's searching from) finding semantic tip-offs like declarative sentences. It then cross references each of those against one another, surfacing some of the most relevant ones to the top, as well as factoring in the order of how they appeared. What the user then gets is a tidy list of statements, each of which is sourced and given a level of relevancy based on their appearances in all of the indexed source pages combined.
Whew. Got that? Great, here's an example of what it looks like in motion, as seen on a basic search for Sarah Palin on Twitter:
One of the Factery Labs example applications is a search engine that finds facts from Twitter source results.
(Credit: CNET)Of course, one of the problems with Factery Labs' approach across multiple sources--be it Twitter, or multiple URLs is accuracy; like how can it realize something like The Onion is not the same as the Associated Press?
The short answer is that it can't. Factery Labs can't determine the truth value of what it finds, nor will it ever. "It goes beyond any existing technology. And nobody knows how to do that. I mean, I don't even know how to do that--people don't even know how to do that," Pedersen said. "We are absolutely neutral. We have nothing in the system that has any bias in terms of anything. The only mechanism we maintain is egregious spam, the bad guys."
Along with maintaining a blacklist of these bad sites, FacteryLabs also keeps a list of good sources, or ones that continuously deliver. The more often an author successfully recommends a usable page, the faster they'll accumulate rank among the results.
What you can play with today
As for applying that technology to some consumer products, Factery Labs is launching with a handful of development partners, each of which has already built a tool that makes use of FactRank. The most notable one comes from Sobees which is using the service to add relevancy to Twitter and FriendFeed search results--something that's no small feat.
Users can do a search on Sobees' Silverlight-based Twitter client as usual, but there will now be a FactRank button that can sort through those tweets. It does a quick once-over of all of the results, and will filter the most relevant information to the very top. Included in each of its results is also a shortlist of the facts it finds on every page.
One of the first third-party apps to make use of Factery Labs is Sobees, which is adding its fact finding filters and relevancy tools to Twitter and FriendFeed search.
(Credit: Factery Labs)Advanced users might find more utility in an updated version of Ultimate Info, an extension for Firefox that does a number of things with on-page data. Starting Tuesday, it will let users select links on a page, each of which gets the fact-finding treatment using FactRank.
In our demo, Gaddis used Ultimate Info on the front page of popular site Drudge Report, highlighting about six or seven URLs that were on the page, then running a FactRank query, which brought in its fact results in just a few seconds. As Pedersen explained, users could run something similar on a long article (or several long articles about the same subject), and FactRank's algorithm would be able to provide a fact summary in short order.
Not launching on Tuesday but where the company expects to see the most development is on mobile devices. "Our analysis shows that mobile devices are a prime target for this technology because the latency produces a lot of resistance in the browse experience," said Pedersen. Instead of a user just getting back a link dump of all the URLs it finds, the FactRank engine will go out, process those results, then deliver users with a summary of the best selection of facts--a move that will save the end user from having to wait for any extra pages to load.
If you want to give some of the third party Factery Labs tools a run, you can find them on the company's implementations section. There you'll also find a test search engine that's running off of Twitter's index.
Bing's Twitter search starts with a zeitgeist view.
(Credit: Screenshot by Rafe Needleman/CNET)Microsoft is getting into the real-time search business, as we reported earlier Wednesday from the Web 2.0 Summit. It's good to see a mainstream product dive into this stream, as one of the big issues with searching Twitter is that timeliness can swamp relevancy.
Bing has the opportunity to leverage its well-developed search engine chops to address this--not only will public tweets will show up in search results, Bing can rank results based on relevance of the post, the popularity of the writer, and other, more complex factors.
Uncharacteristically for Microsoft, the new search feature went live shortly after the announcement. (We're told the Facebook integration, which was also announced, will be rolled out in the future.) Here's how Twitterized Bing works for users so far:
The main page give a nice overview of trending topics with a search cloud at the top of the page and a list of popular links that are being shared below it. It's a good way to get a sense of the buzz on Twitter at any moment.
Bing's basic Twitter search result page gives you both live tweets (at the top), and shared links (below).
(Credit: Screenshot by Rafe Needleman/CNET)Search results pages themselves are likewise split into two sections, a live feed at the top with just four tweets, and a list of shared links at the bottom. Results stream in live at the top of the page, but you can pause the influx.
The "Best match" search juggles the display order to put relevant tweets up top, even if they're not the most timely.
(Credit: Screenshot by Rafe Needleman/CNET)If you click on the link to "see more tweets" on the main result page, you get a full page of tweets on your query, with the interesting option to sort the results by "Best match." If you choose this, Bing takes a stab at ranking results based on their content and possibly other factors, like popularity and online status of the writer.
Timeliness is still a factor in "Best match" results, so you won't get day-old tweets at the top of the list on a hot topic, but adding a relevancy sort on top of that does make the search results more useful. This is especially true for hot topics where tweets feeding into a time-only sort can end up pushing useful and relevant content right off the page.
Bing unpacks short URLs to show you what people are sharing, with no surpise links.
(Credit: Screenshot by Rafe Needleman/CNET)Back on the main result page, there are links related to your search query. These are automatically unpacked from URL shorteners like Bitly. The link results have under them tweets that included a short link to the page, even if different shorteners were used to get there. Bing's Twitter search thus does a good job of pulling commentary together on a topic (a link) even from people who've never communicated with each other on the service.
None of what Bing does with Twitter is startlingly new. Twitter's own search gives great real-time Twitter results. Other engines like Twazzup and Scoopler combine relevancy rankings into their results. And OneRiot does a very good job with shared links. But it is good to see real-time content start to bleed into mainstream search. It could be useful and relevant for everyone.
But this story won't get truly interesting until the real-time feeds, from Twitter and elsewhere, start to infect the mainstream Web search results. When a trending topic or popular shared link on Twitter starts to change the way standard results are ranked, we'll start to have truly real-time search for all content. Twitter will have an impact across the Web, even for people who never use it.
Twitter messages from prominent writers like All Things D's Kara Swisher are now in Bing search results.
(Credit: Bing)Microsoft is trying to get a leg up in the real-time search wars by adding Twitter messages to search results.
Bing will now surface results for certain celebrities (leading to the odd pairing of search guru Danny Sullivan and American Idol host Ryan Seacrest in the same sentence) when users search on their names and "twitter," the company announced Wednesday afternoon. It's not indexing all of Twitter, instead picking "a few thousand people to start" and using Twitter's public API to display those results in a special box among the other search results, such as stories that a person might have written about Twitter.
Amid all the nauseating Twitter adoration of late lies a real trend within the search community: the desire to display search results that contain items from real-time communication services. Right now, this is done haphazardly by the Big Three, although smaller companies are trying to offer this service for those who just can't wait.
Both Google and Yahoo, for example, will return the main Twitter page and a single tweet as the top two search results for "Ryan Seacrest Twitter." They don't call out multiple tweets within a single defined box, as Bing will now do with the new feature.
Bing's Twitter feature is rolling out slowly over the day on Wednesday.
Collecta is a new real-time search engine that taps into Twitter, Flickr, blog comments, and news sites--all at once. Users are able to quickly filter which sources they want to search from, and can leave multiple searches running continuously, so that the latest content keeps rising to the top.
What I really like is that you can just leave it running in the background, and come back to check on searches throughout the day. I often do the same thing with TweetDeck with Twitter searches, but what's nice about Collecta is that it's grabbing search results from multiple sources.
Another nice feature is that you can preview most types of content without leaving the results. This is more useful for photos and blog posts than messages from microblogging services like Twitter, Jaiku and Identi.ca--the three it culls from. It also only captures images from Twitter, but I expect it to add more in the future.
Collecta lets you keep multiple searches running in the background. You can hop between them any time, and filter the sources to choose where you want it culling from.
(Credit: CNET)Where these tools still have a ways to go is in weighing content from certain sources more heavily than others, and helping to weed through some of the duplicate entries. Competitor OneRiot does a good job at this by choosing relevancy based not only on time, but the results that have been shared more heavily among other users at a specific source.
Note: At the time of publishing this post, the site is no longer functioning correctly. When I began to use the service right after its launch it was coming up with results almost instantly, and now it's just continuously searching without showing any results. Since I've seen it work so quickly, I'm willing to put it up to some launch jitters, but you may have to wait until later in the day to give it a proper go.
See also Wednesday's launch of Twitter search engine CrowdEye, which was made by the former head of Microsoft's search unit.
- prev
- 1
- next





