OpenCalais, a Thomson Reuters project to improve electronic publishing by adding computer-readable labels to content, has attracted the attention of several media publishing organizations, including CNET.
The OpenCalais product, available in a free or a more sophisticated paid form, adds labels to content through a technology called semantic analysis. By adding descriptive labels, computers at least theoretically can understand what they're processing beyond just the raw text in a news story or photo caption, for example by recognizing addresses or names.
CNET, publisher of CNET News, is using OpenCalais' service to augment its product reviews and news, the companies plan to announce Thursday. CNET will use the technology to improve features such as searching, spotlighting content related to what a reader is viewing, and enabling programmatic use of its content over the Web.
Others using the technology include The HuffingtonPost and DailyMe, two other online news sites. DailyMe automatically sends its content through OpenCalais' servers, which labels the content with categories such as people, medical conditions, or companies and with specific elements of those categories, said Neil Budde, president and chief product officer.
"It allows us to build picture of news user's behavior to implicitly personalize the site for them," Budde said, adding that automated personalization features are scheduled to arrive in about a month. The company plans to license its service to other news sites, he added, and improve advertising targeting based on the same personalization information.
A closely related technology, the semantic Web, in which elements of Web pages are labeled with computer-readable coding to help computers better understand the meaning of the content, has been around for years. It's only now beginning to gain adoption as a real-world technology because of two big reasons, though: Yahoo and Google.
A year ago, Yahoo announced its search engine had begun recording semantic Web tags and could spruce up those pages' appearance in search results through Yahoo's SearchMonkey technology. Then, in May, Google announced a similar move with both indexing and display of pages in search results. OpenCalais, however, offers technology that creates online content that search engines discover through conventional means of analyzing text.
The tagging seems to help search engines find the company's content and spotlight it in search results, Budde said. "We create a lot of topics pages on the fly based on entities that come in from Calais, and those get pretty good pickup through search engines definitely," he said.
Paul Perry, The Huffington Post's chief technology officer, has begun using OpenCalais' service in the company's publishing system. When a story mentions a specific location or company, for example, OpenCalais' service suggests to editors the ability to associate the story with a specific geographic location or to add a specific company's stock ticker, Perry said.
That explicit labeling makes it easier for local editors--Chicago so far is the only city with localized Huffington Post news, though more areas will arrive this summer--to spot geographically relevant information, he said. "For us, local is super important. We're doing a ton of work for it," he said.
Semantic technology fans will convene starting June 14 for the Semantic Technology Conference in San Jose, Calif., at which Thomson Reuters' Tom Tague is scheduled to deliver a keynote speech.
Google's adoption of Semantic Web-inspired technologies could help finally spur adoption of the long-discussed concept. In this example, the ratings assigned to a restaurant are shown.
(Credit: Google)Although Google has traditionally downplayed the importance of the Semantic Web, the company took a noticeable step toward embracing it Tuesday.
Google introduced an enhancement to its search results at Searchology 2009 that uses technologies commonly associated with the Semantic Web, or the concept of making Web pages more discoverable and understandable to computers. Calling it "Rich Snippets," Google's Marissa Mayer and Kavi Goel demonstrated how Google is working with tech publishers (including CNET) to display new types of information in search results such as the number of stars assigned to a particular gadget.
The Semantic Web is a project spawned by World Wide Web Consortium director Tim Berners-Lee in hopes of making the Web smarter, more attune to meaning, and able to deliver information with little human prodding. The idea is to move beyond the limits of common Web pages--which can display information to a Web browser but are unable to broadcast the meaning and significance of that information to search engines and other computers--theoretically making for more relevant search results.
It has proven harder to do that in practice, however, than in theory. There are several formats that have been evangelized for the Semantic Web, and without incentives to add such formats to their Web pages (let alone make a bet on one or the other) publishers haven't made much progress since Berners-Lee began advancing the concept in the late 1990s.
And that's especially true if the 800-pound gorilla of the Web isn't keen on providing those incentives. Google has appeared cold to the idea several times in the past, as recently as earlier this year when researchers opined: "The first lesson of Web-scale learning is to use available large-scale data rather than hoping for annotated data that isn't available." Or, in short, work with what you've got when it comes to organizing the Web.
But rival Yahoo has embraced Semantic Web technology with its SearchMonkey project, encouraging Web publishers to add data to their sites that helps improve the quality of search results. Even if it's had to make a few tweaks, it has started getting Web sites used to the idea of adding structured data, or semantic technology, to their code.
Google's embrace of such technology gives it an immediate boost, and presents a few initial worries. Google's heft on the Web means it can get companies that depend on the quality of its search results (such as CNET and Yelp, featured in the demonstration) to follow its lead and force others to follow suit to compete.
But publishers must be sure that Google has backed the right horse to feel totally confident in its approach. Google's Rich Snippets can decode two different Semantic Web technologies--RDFa and microformats--to produce its enhanced search results, which seem to be popular among those advocating this approach to Web development.
The company said it would not penalize publishers who do not participate in the Rich Snippets program by declaring their Web pages less relevant, which theoretically makes this an opt-in program. But Google's Goel acknowledged that Web pages enhanced with Rich Snippets could see higher click-through rates, which would improve their relevance in Google's algorithm.
Backers of Semantic Web technology will likely find both something to cheer and cause for concern with Google's announcements on Tuesday. That seems par for the course when it comes to gauging the impact of decisions made by one of the most important companies in the tech world.
Web pioneer Tim Berners-Lee says he is making sure the Semantic Web will respect the privacy of online communications and allow people to control who can use their data.
The Semantic Web, an ongoing project overseen by the World Wide Web Consortium (W3C), seeks to enable the Web to intelligently interpret what people are seeking when they search the Net.
In one example, computers will data-tag photographs and combine those tags with information from a desktop calendar, so people can ask the Web what the people in the photograph were doing on a particular day.
However, researchers have warned that the combination of such personal information could lead to privacy compromises, including increased data mining.
Berners-Lee, who is director of W3C and the person credited with creating the World Wide Web, told ZDNet UK this week that the teams working on the Semantic Web project are making sure privacy principles are included in its architecture.
Semantic Web technology "certainly" will enhance privacy, Berners-Lee said. "The Semantic Web project is developing systems which will answer where data came from and where it's going to--the system will be architectured for a set of appropriate uses."
Another principle of the Semantic Web is that people who make a Web request for personal information being held by third parties, such as companies and government agencies, will be able to see all the data those organizations hold on them, according to Berners-Lee.
"W3C wants to help make sure data use is appropriate," he said. "Sometimes, it's a serious question who should have what access" to information.
In addition, the project will include accountable data-mining components, which let people know who is mining the data, and its teams are looking at making the Web adhere to privacy preferences set by users. The whole project is geared toward privacy enhancement, Berners-Lee said. The teams "are building systems to be aware of different data uses," he said.
ZDNet UK spoke to Berners-Lee at an event at England's House of Lords designed to draw attention to the use of deep-packet inspection by Internet service providers and third parties. The technique intercepts data packets sent over the Internet to analyze their content, which Berners-Lee likens to the postal service opening the mail it is charged with delivering.
"When people built the Internet, it was designed to be a cloud," said Berners-Lee. "When routing packets, the system only looks at the envelope--it's an important design principle. Now people find out what you write in your letters."
Tom Espiner of ZDNet UK reported from London.
Through an easier-to-use variation of its SearchMonkey technology, Yahoo search results now can spotlight videos, games, and documents that Web sites label as such with special coding. (Click to enlarge.)
(Credit: screenshot by Stephen Shankland/CNET)SUNNYVALE, Calif.--Call it SearchMonkey Lite--an easier way for a Web site to spotlight its videos, games, and documents in Yahoo's search results.
Yahoo has been working to let publishers spotlight their content in its search results through a program called SearchMonkey, but the company has concluded the technology's power comes at the expense of ease of use. Now Yahoo is offering a lightweight way to use SearchMonkey that it hopes will make the service approachable to average Web page creators.
The company posted a blog entry with some basic text that can be tweaked then inserted into Web pages. Doing so will mean Yahoo's Web crawling software will recognize videos, games, and documents, and those data types then can be shown prominently alongside the Web address in Yahoo's search results, said Tom Chi, senior director of product for Yahoo search in an interview here at Yahoo headquarters.
"There's very little code required to engage with this," Chi said of the templates Yahoo is offering. "Adding that extra bit of structure helps those who might be less technically experienced."
Yahoo's Tom Chi
(Credit: Stephen Shankland/CNET)Video results are appearing now, and games and documents should start appearing over the next couple weeks, he said. However, Yahoo will add the results in gradually to ensure its results aren't being gamed or polluted with spam, he added.
Yahoo is trying to make its search more useful and therefore more used, part of its attempt to compete with search leader Google. The more search results are shown, the more opportunities the search provider has to show related advertising.
With this lightweight version of SearchMonkey, search results become more of a destination unto themselves. Right on the search page, the videos can be watched, the games can be played, and the documents can be read. (See screenshot below.)
SearchMonkey relies on Yahoo's search engine finding "structured" data on the Internet--Web sites whose elements have been labeled so computers can know when they've found an address, a video, or other particular types of information. That structured data is a crucial element of what's called the Semantic Web, a years-old idea that computers should be able to understand the meaning and not just the text of Web sites.
"We hope that through programs like this, it'll be possible for publishers to start getting engaged with the Semantic Web," Chi said.
Clicking the video on the Yahoo search page lets it be played directly the search results.
(Credit: screenshot by Stephen Shankland/CNET)
When Yahoo announced BOSS (Build Your Own Search Service) in 2008, the company said it planned to make money from the service. On Wednesday, though, the company announced it's got a new way in mind: charging for high-volume use of the search data.
Yahoo will charge for use of the BOSS API (application programming interface), the service by which other Web sites can extract Yahoo's search data then repurpose it to their hearts' content, according to a blog post by Ashim Chhabra of Yahoo's Search BOSS team. Previously, the company had planned to make money from BOSS by requiring outsiders with high-traffic sites to show Yahoo search ads next to their results.
The new approach allows companies to pursue their own monetization strategies and will help make the API itself more useful by lifting constraints, Chhabra said.
"We're introducing fees for a couple of reasons. First and most importantly, we're hard at work on a number of technologies that will enhance both the functionality and performance of BOSS, and usage fees will help support this development," Chhabra said. "Second, we believe that introducing the proposed pricing structure will improve the ecosystem by optimizing capacity for our serious developers."
BOSS is one part of Yahoo's attempt to make its search more competitive with dominant rival Google, which gained market share over rivals in January, according to Nielsen Online.
One limit that's lifted will be the amount of search results that can be retrieved with one call to the BOSS API; with the fee structure, that limit goes from 50 to 1,000. Yahoo also will offer a service level agreement (SLA) so outsiders can count on BOSS working.
BOSS now can show Web sites' descriptive data spotlighted by Yahoo's SearchMonkey service.
(Credit: Yahoo)The new fees likely will go into effect late in the second quarter, according to the BOSS fee page; those who use the service will pay on the basis of 10-cent units. For example, retrieving the first 100 results for 1,000 searches costs 10 units; developers will get 30 free credits a day, and the rate goes down during off-peak hours.
SearchMonkey injection
Yahoo also announced it's grafting some SearchMonkey technology into BOSS. SearchMonkey can gussy up certain Yahoo search results in cases when the Web sites listed describe their own data with computer-oriented descriptions called microformats such as a restaurant indicating its address. This idea, called the "semantic Web" and long under development, theoretically gives computers a better understanding of what's on Web pages.
The BOSS API now can be set so that search data it retrieves spotlights that structured data, Chhabra said.
BOSS now also shows two other elements: longer 300-character descriptions of each page in search results, up from 170 characters, and some data retrieved by Yahoo's SiteExplorer technology, which can show details such as popular pages within a particular Web site or a list of other Web sites that link to it.
- prev
- 1
- next




