New start-up Factery Labs is launching its first service on Tuesday, a technology called FactRank that can tear through Web pages and collect what it calls "facts." These are bits of information from each source page that Factery Labs' algorithm then organizes into an order of importance.
What this means for you is that developers will soon make use of the technology in third-party search engines or on Web pages to very quickly deliver reading summaries. This cuts out most (or all) of the parts you don't care about, while organizing the bits you might. It also manages to do all this in real time.
The FactRank technology was created by Paul Pedersen, who has a good background in search, including gigs at Inktomi, Google, and Powerset. CNET News met with him and co-founder Sean Gaddis (former Skype and eBay'er) on Monday to get a demo of how the technology works.
In a nutshell it goes like this: FactRank goes through each Web page or source (in whatever index it's searching from) finding semantic tip-offs like declarative sentences. It then cross references each of those against one another, surfacing some of the most relevant ones to the top, as well as factoring in the order of how they appeared. What the user then gets is a tidy list of statements, each of which is sourced and given a level of relevancy based on their appearances in all of the indexed source pages combined.
Whew. Got that? Great, here's an example of what it looks like in motion, as seen on a basic search for Sarah Palin on Twitter:
One of the Factery Labs example applications is a search engine that finds facts from Twitter source results.
(Credit: CNET)Of course, one of the problems with Factery Labs' approach across multiple sources--be it Twitter, or multiple URLs is accuracy; like how can it realize something like The Onion is not the same as the Associated Press?
The short answer is that it can't. Factery Labs can't determine the truth value of what it finds, nor will it ever. "It goes beyond any existing technology. And nobody knows how to do that. I mean, I don't even know how to do that--people don't even know how to do that," Pedersen said. "We are absolutely neutral. We have nothing in the system that has any bias in terms of anything. The only mechanism we maintain is egregious spam, the bad guys."
Along with maintaining a blacklist of these bad sites, FacteryLabs also keeps a list of good sources, or ones that continuously deliver. The more often an author successfully recommends a usable page, the faster they'll accumulate rank among the results.
What you can play with today
As for applying that technology to some consumer products, Factery Labs is launching with a handful of development partners, each of which has already built a tool that makes use of FactRank. The most notable one comes from Sobees which is using the service to add relevancy to Twitter and FriendFeed search results--something that's no small feat.
Users can do a search on Sobees' Silverlight-based Twitter client as usual, but there will now be a FactRank button that can sort through those tweets. It does a quick once-over of all of the results, and will filter the most relevant information to the very top. Included in each of its results is also a shortlist of the facts it finds on every page.
One of the first third-party apps to make use of Factery Labs is Sobees, which is adding its fact finding filters and relevancy tools to Twitter and FriendFeed search.
(Credit: Factery Labs)Advanced users might find more utility in an updated version of Ultimate Info, an extension for Firefox that does a number of things with on-page data. Starting Tuesday, it will let users select links on a page, each of which gets the fact-finding treatment using FactRank.
In our demo, Gaddis used Ultimate Info on the front page of popular site Drudge Report, highlighting about six or seven URLs that were on the page, then running a FactRank query, which brought in its fact results in just a few seconds. As Pedersen explained, users could run something similar on a long article (or several long articles about the same subject), and FactRank's algorithm would be able to provide a fact summary in short order.
Not launching on Tuesday but where the company expects to see the most development is on mobile devices. "Our analysis shows that mobile devices are a prime target for this technology because the latency produces a lot of resistance in the browse experience," said Pedersen. Instead of a user just getting back a link dump of all the URLs it finds, the FactRank engine will go out, process those results, then deliver users with a summary of the best selection of facts--a move that will save the end user from having to wait for any extra pages to load.
If you want to give some of the third party Factery Labs tools a run, you can find them on the company's implementations section. There you'll also find a test search engine that's running off of Twitter's index.
Bing's Twitter search starts with a zeitgeist view.
(Credit: Screenshot by Rafe Needleman/CNET)Microsoft is getting into the real-time search business, as we reported earlier Wednesday from the Web 2.0 Summit. It's good to see a mainstream product dive into this stream, as one of the big issues with searching Twitter is that timeliness can swamp relevancy.
Bing has the opportunity to leverage its well-developed search engine chops to address this--not only will public tweets will show up in search results, Bing can rank results based on relevance of the post, the popularity of the writer, and other, more complex factors.
Uncharacteristically for Microsoft, the new search feature went live shortly after the announcement. (We're told the Facebook integration, which was also announced, will be rolled out in the future.) Here's how Twitterized Bing works for users so far:
The main page give a nice overview of trending topics with a search cloud at the top of the page and a list of popular links that are being shared below it. It's a good way to get a sense of the buzz on Twitter at any moment.
Bing's basic Twitter search result page gives you both live tweets (at the top), and shared links (below).
(Credit: Screenshot by Rafe Needleman/CNET)Search results pages themselves are likewise split into two sections, a live feed at the top with just four tweets, and a list of shared links at the bottom. Results stream in live at the top of the page, but you can pause the influx.
The "Best match" search juggles the display order to put relevant tweets up top, even if they're not the most timely.
(Credit: Screenshot by Rafe Needleman/CNET)If you click on the link to "see more tweets" on the main result page, you get a full page of tweets on your query, with the interesting option to sort the results by "Best match." If you choose this, Bing takes a stab at ranking results based on their content and possibly other factors, like popularity and online status of the writer.
Timeliness is still a factor in "Best match" results, so you won't get day-old tweets at the top of the list on a hot topic, but adding a relevancy sort on top of that does make the search results more useful. This is especially true for hot topics where tweets feeding into a time-only sort can end up pushing useful and relevant content right off the page.
Bing unpacks short URLs to show you what people are sharing, with no surpise links.
(Credit: Screenshot by Rafe Needleman/CNET)Back on the main result page, there are links related to your search query. These are automatically unpacked from URL shorteners like Bitly. The link results have under them tweets that included a short link to the page, even if different shorteners were used to get there. Bing's Twitter search thus does a good job of pulling commentary together on a topic (a link) even from people who've never communicated with each other on the service.
None of what Bing does with Twitter is startlingly new. Twitter's own search gives great real-time Twitter results. Other engines like Twazzup and Scoopler combine relevancy rankings into their results. And OneRiot does a very good job with shared links. But it is good to see real-time content start to bleed into mainstream search. It could be useful and relevant for everyone.
But this story won't get truly interesting until the real-time feeds, from Twitter and elsewhere, start to infect the mainstream Web search results. When a trending topic or popular shared link on Twitter starts to change the way standard results are ranked, we'll start to have truly real-time search for all content. Twitter will have an impact across the Web, even for people who never use it.
Twitter messages from prominent writers like All Things D's Kara Swisher are now in Bing search results.
(Credit: Bing)Microsoft is trying to get a leg up in the real-time search wars by adding Twitter messages to search results.
Bing will now surface results for certain celebrities (leading to the odd pairing of search guru Danny Sullivan and American Idol host Ryan Seacrest in the same sentence) when users search on their names and "twitter," the company announced Wednesday afternoon. It's not indexing all of Twitter, instead picking "a few thousand people to start" and using Twitter's public API to display those results in a special box among the other search results, such as stories that a person might have written about Twitter.
Amid all the nauseating Twitter adoration of late lies a real trend within the search community: the desire to display search results that contain items from real-time communication services. Right now, this is done haphazardly by the Big Three, although smaller companies are trying to offer this service for those who just can't wait.
Both Google and Yahoo, for example, will return the main Twitter page and a single tweet as the top two search results for "Ryan Seacrest Twitter." They don't call out multiple tweets within a single defined box, as Bing will now do with the new feature.
Bing's Twitter feature is rolling out slowly over the day on Wednesday.
Collecta is a new real-time search engine that taps into Twitter, Flickr, blog comments, and news sites--all at once. Users are able to quickly filter which sources they want to search from, and can leave multiple searches running continuously, so that the latest content keeps rising to the top.
What I really like is that you can just leave it running in the background, and come back to check on searches throughout the day. I often do the same thing with TweetDeck with Twitter searches, but what's nice about Collecta is that it's grabbing search results from multiple sources.
Another nice feature is that you can preview most types of content without leaving the results. This is more useful for photos and blog posts than messages from microblogging services like Twitter, Jaiku and Identi.ca--the three it culls from. It also only captures images from Twitter, but I expect it to add more in the future.
Collecta lets you keep multiple searches running in the background. You can hop between them any time, and filter the sources to choose where you want it culling from.
(Credit: CNET)Where these tools still have a ways to go is in weighing content from certain sources more heavily than others, and helping to weed through some of the duplicate entries. Competitor OneRiot does a good job at this by choosing relevancy based not only on time, but the results that have been shared more heavily among other users at a specific source.
Note: At the time of publishing this post, the site is no longer functioning correctly. When I began to use the service right after its launch it was coming up with results almost instantly, and now it's just continuously searching without showing any results. Since I've seen it work so quickly, I'm willing to put it up to some launch jitters, but you may have to wait until later in the day to give it a proper go.
See also Wednesday's launch of Twitter search engine CrowdEye, which was made by the former head of Microsoft's search unit.
- prev
- 1
- next




