Google has penalized the clout that its own Japanese site has in search results after a promotion that violated the company's own search policies.
Busted: Google slapped down the PageRank score for its Japanese Web site.
(Credit: Google)Earlier this week, Google canceled a promotion in Japan that paid bloggers to write about a new feature that showed popular new search terms on Google's Japanese home page. Now Google is administering the same punishment to its Japanese site that it hands out to others who similarly violate its policies.
"Google.co.jp PageRank is now ~5 instead of ~9. I expect that to remain for a while," said Matt Cutts, who leads Google's efforts to screen bogus Web sites from Google's search, in a Twitter post on Wednesday. A site with a higher PageRank score gets more prominence in search results and can boost the prominence of other sites to which it links.
Through Google's PageRank algorithm, a site to which many other sites link gets higher placement in search results, and Google works hard to ensure its search results aren't skewed by sites with inappropriate paid links. That would let sites simply pay for higher search results; Google wants the quality and relevance of the site's content to determine such placement.
(Via Google Blogoscoped.)
It's a pity the National Security Agency can't talk about its computational challenges, because it's leaving a lot of the boasting rights to Google.
(Credit:
Paul Ford)
In a blog posting on Friday the company shared some detail about the challenges of one aspect of its search operation, the Web indexing and processing that must take place before the results are delivered to users. The short version: Google has no choice but to think big.
First comes surfing. "We start at a set of well-connected initial pages and follow each of their links to new pages. Then we follow the links on those new pages to even more pages and so on, until we have a huge list of links," said software engineers Jesse Alpert and Nissan Hajaj. "Even after removing...exact duplicates, we saw a trillion unique URLs, and the number of individual web pages out there is growing by several billion pages per day."
Next comes analyzing the "link graph"--the mathematical representation of what links to what. That's a key foundation of Google's PageRank algorithm, which brought the company's search engine to prominence by assigning importance to those pages that other important pages point toward.
In the early days of Google, computing PageRank for the company's collection of a mere 26 million pages took a workstation "a couple hours," and the results would be used for some unspecified period of time. Today, Google surfs the Web continuously and recalculates the link graph "several times per day."
"This graph of one trillion URLs is similar to a map made up of one trillion intersections. So multiple times every day, we do the computational equivalent of fully exploring every intersection of every road in the United States. Except it'd be a map about 50,000 times as big as the U.S., with 50,000 times as many roads and intersections," the engineers said.
Google likes to talk about how users have choice and competition just one click away, and that's a fair point. But the blog post also makes it even clearer just how high barriers to entry are in the search market. That's one of the reasons Yahoo's BOSS (build your own search service) program is intriguing: it lets search start-ups take advantage of Yahoo's crawling, indexing, and search technology in exchange for advertising or revenue-sharing partnerships.
Though a distant third place to Google, Microsoft thinks it can teach its rival a thing or two about searching the Internet.
A big part of Google's rise to search engine leadership was an algorithm called PageRank that assesses a specific page's importance by how many other Web pages link to it and by the importance of those linking pages. Microsoft researchers and academic collaborators, though, detailed an idea this week it calls BrowseRank that seeks to bring more of a human touch to that assessment.
Microsoft likes the results BrowseRank, which assigning Web page priority based on how people actually use the site.
(Credit: Microsoft ResearchA Asia)Essentially, the researchers tested out a system that replaces PageRanks' link graph--a mathematical model of the hyperlinked connections of the Internet--with what they call a user browsing graph that ranks Web pages by people's behavior.
"The more visits of the page made by the users and the longer time periods spent by the users on the page, the more likely the page is important. We can leverage hundreds of millions of users' implicit voting on page importance," the researchers said in BrowseRank: Letting Web Users Vote for Page Importance, a paper from the SIGIR (Special Interest Group on Information Retrieval) conference this week in Singapore. Authors are Bin Gao, Tie-Yan Liu, and Hang Li from Microsoft Research Asia and Ying Zhang of Nankai University, Zhiming Ma of the Chinese Academy of Sciences, and Shuyuan He of Peking University.
Search is of tremendous importance to the Internet for many reasons. For one thing, search engines are highly influential middlemen that steer users to Web sites they may not be able to find on their own. For another, queries typed into search engines can be powerful--and in Google's case highly profitable--indications of what type of advertisement to place next to the search results.
But Microsoft lags leader Google and No. 2 Yahoo in search. It's trying hard to catch up, for example with unsuccessful proposals to acquire Yahoo or its search business that would cost the company billions of dollars. And Microsoft just bought search start-up Powerset.
Google isn't putting all its eggs in the PageRank basket, though.
"It's important to keep in mind that PageRank is just one of more than 200 signals we use to determine the ranking of a Web site," the company said in a statement. "Search remains at the core of everything Google does, and we are always working to improve it."
PageRank shortcomings
The Microsoft researchers argue that PageRank has a number of problems. For one thing, people can game the system by building bogus Web sites called link farms. Those sites feature hyperlinks point to a Web page whose importance a person wants to inflate so it appears higher in search results. Another PageRank issue is that the indexing process doesn't take into account the time a user spends on a particular site.
But user behavior, monitored in anonymous form by Web servers and Web browser plug-ins, can be better, the authors argue.
"Experimental results show that BrowseRank can achieve better performance than existing methods, including PageRank...in important page finding, spam page fighting, and relevance ranking.
The researchers gathered their data from "an extremely large group of users under legal agreements with them," according to the paper.
There's no denying PageRank is useful, though, and such algorithms could be added into a larger formula for determining which sites come out on top of search results.
"It is also possible to combine link graph and user behavior data to compute page importance," the researchers said. "We will not discuss more about this possibility in this paper, and simply leave it as future work."
Bringing research to fruition
It can be a long time before research comes to fruition, but funding a group of researchers can be much less expensive than acquiring other companies. No doubt Microsoft, especially after years of effort and its thwarted overtures to Yahoo, would like to see its in-house search efforts bring Google to its knees.
When accused of being dominant, Google representatives often argue the company could lose its search dominance if somebody else builds a better mousetrap and Internet users divert their path to that other door door. "If Microsoft or Yahoo are successful in providing similar or better web search results or more relevant advertisements, or in leveraging their platforms or products to make their Web search or advertising services easier to access, we could experience a significant decline in user traffic or the size of the Google (ad) Network," it said in its most recent quarterly report.
The top players are a moving target, though. Yahoo is hoping to improve search with three efforts: BOSS (build your own search service), which lets others employ Yahoo search results along with its search ads; SearchMonkey, which lets content publishers build elaborate mini-Web pages into search results; and Glue Pages, which present a smorgasbord of related content alongside search results.
And Google invests heavily, too. Its biggest research team is devoted to search, and the company updated its search formula more than 100 times in the second quarter. And researchers have huge infrastructure at their disposal to try new ideas.
"My group at Google has at its disposal many thousands of machines, with storage measured in petabytes," Udi Manber, head of Google's search quality, said of Google's search research infrastructure in a June talk. And, he added, engineers are empowered to try their results, with meetings once or twice a week to see how well they worked: "There is no separation of research and development. Everyone does both."
Google headquarters in Mountain View, Calif.
(Credit: Stephen Shankland/CNET News.com)Google has been sharing more about how its search engine works, and we got another installation in a series of blog posts on Wednesday: details of Google ranking.
The post--the second by Amit Singhal--will be familiar to close watchers of Google or to those who have spent time listening to recent executive speeches from Marissa Mayer, Google's vice president of search and user experience, and others. But it's still worth a read: it sheds some light on a process that many people probably see only as imponderable magic or simpler than it really is.
In short, Singhal describes various parts of the search problem. Google must have "understanding" of the pages it indexes, the queries people type into the Google search page, and attributes of the searcher such as what region the user is searching from.
Singhal indulges in a little self-congratulation about the quality of Google search results. But like Google's chief engineer in charge of search quality, Udi Manber, he also takes pains to emphasize that "search is nowhere close to being a solved problem."
As part of Google's effort to shed a bit more light on its search work, the company on Wednesday detailed some of the process it uses to order the results its search engine produces.
The most interesting element of the post by Amit Singhal, a Google fellow who oversees the area, is a discussion of why the company doesn't manually elevate particular search results to obtain the right order. However, the company does of course hand-tune the algorithm that ranks the results, so you can consider manual intervention still relevant at a higher level.
Google gives two reasons for its prohibition against manual intervention. First is its belief that its own individual judgment is never as good as the collective judgment of the Internet overall, whose hyperlink structure forms part of the basis for Google ranking.
Second, fixing the algorithm rather than a specific result, if done right, helps more than just one particular search. "Often a broken query is just a symptom of a potential improvement to be made to our ranking algorithm. Improving the underlying algorithm not only improves that one query, it improves an entire class of queries, and often for all languages," Singhal said.
Though the company has talked earlier about how it doesn't hand-tune specific search results, Singhal went into a little more detail. Not a lot, though: the post is more of a teaser that lays some groundwork, but Singhal promised more later.
- prev
- 1
- next





