Google has penalized the clout that its own Japanese site has in search results after a promotion that violated the company's own search policies.
Busted: Google slapped down the PageRank score for its Japanese Web site.
(Credit: Google)Earlier this week, Google canceled a promotion in Japan that paid bloggers to write about a new feature that showed popular new search terms on Google's Japanese home page. Now Google is administering the same punishment to its Japanese site that it hands out to others who similarly violate its policies.
"Google.co.jp PageRank is now ~5 instead of ~9. I expect that to remain for a while," said Matt Cutts, who leads Google's efforts to screen bogus Web sites from Google's search, in a Twitter post on Wednesday. A site with a higher PageRank score gets more prominence in search results and can boost the prominence of other sites to which it links.
Through Google's PageRank algorithm, a site to which many other sites link gets higher placement in search results, and Google works hard to ensure its search results aren't skewed by sites with inappropriate paid links. That would let sites simply pay for higher search results; Google wants the quality and relevance of the site's content to determine such placement.
(Via Google Blogoscoped.)
It's a pity the National Security Agency can't talk about its computational challenges, because it's leaving a lot of the boasting rights to Google.
(Credit:
Paul Ford)
In a blog posting on Friday the company shared some detail about the challenges of one aspect of its search operation, the Web indexing and processing that must take place before the results are delivered to users. The short version: Google has no choice but to think big.
First comes surfing. "We start at a set of well-connected initial pages and follow each of their links to new pages. Then we follow the links on those new pages to even more pages and so on, until we have a huge list of links," said software engineers Jesse Alpert and Nissan Hajaj. "Even after removing...exact duplicates, we saw a trillion unique URLs, and the number of individual web pages out there is growing by several billion pages per day."
Next comes analyzing the "link graph"--the mathematical representation of what links to what. That's a key foundation of Google's PageRank algorithm, which brought the company's search engine to prominence by assigning importance to those pages that other important pages point toward.
In the early days of Google, computing PageRank for the company's collection of a mere 26 million pages took a workstation "a couple hours," and the results would be used for some unspecified period of time. Today, Google surfs the Web continuously and recalculates the link graph "several times per day."
"This graph of one trillion URLs is similar to a map made up of one trillion intersections. So multiple times every day, we do the computational equivalent of fully exploring every intersection of every road in the United States. Except it'd be a map about 50,000 times as big as the U.S., with 50,000 times as many roads and intersections," the engineers said.
Google likes to talk about how users have choice and competition just one click away, and that's a fair point. But the blog post also makes it even clearer just how high barriers to entry are in the search market. That's one of the reasons Yahoo's BOSS (build your own search service) program is intriguing: it lets search start-ups take advantage of Yahoo's crawling, indexing, and search technology in exchange for advertising or revenue-sharing partnerships.
Q&A Search has become central to the functioning of the Internet, but Udi Manber isn't the kind of person who takes that for granted.
"I don't have to tell anybody around here that search is important. That's a very nice luxury to have," said Manber, the Google vice president in charge of search quality.
Udi Manber, Google VP, engineering
(Credit: Google)Search quality may seem like an unassuming element of Google's operations, but in fact it's at the core. Manber oversees the company's search algorithm--all the different inputs Google weighs to judge which Web sites to rank highest in search results.
Manber's work has been highly secret, partly because search is central to Google's competitive advantage and partly because Google doesn't want people gaming the system to get artificially prominent results. But the company has begun sharing a smidgen, including an opening blog post by Manber in May. I talked to him at Google headquarters recently.
How mature is search today on the Internet? Are we 5 percent of the way done with the problem? Ninety percent?
My best analogy is that a 15-year-old thinks he's very mature. A 19-year-old thinks he's extremely mature. Every few years you learn that you were not mature before. Search on the Web is about 15 years old, and obviously we were much more mature than we were 5 years ago and 10 years ago and 15 years ago. One way to put it is that it's science fiction every 5 years. What's possible today to me was science fiction 5, or definitely 10 years ago. What was (ordinary) 10 years ago was science fiction 15 years ago. The development is really pretty amazing. It surprised even me. I expect a certain level of progress, and we're actually surpassing it.
You were at the University of Arizona, then Yahoo and Amazon, then A9, then you moved to Google in 2006. Is there anything you've learned from looking at it from different perspectives, or have you been just tackling the same thing with different phone numbers on your business card?
It's the same problem, and I've looked at it from many different angles. It's bigger here, and it's better here. We have a team that's beyond any other team I've ever been with. We put more resources into it. I don't have to tell anybody around here that search is important, and that's a very nice luxury to have.
I remember the old days of AltaVista and HotBot and WebCrawler some of these other search engines--days when search was really very primitive.
I remember starting those things. They looked very sophisticated and mature at the time, which is my point about the 15-year-old.
It's clearly become a lot more usable. But even 10 years ago, everybody hadn't been trained to think the way we get information is we go to a search box and type something in. Now that seems abundantly obvious. What 10 years from now is going to look stunningly obvious as having a search box is today?
It was clear to some people. I don't want to brag too much, but it was clear to me. That's why I moved to search in the early 1990s, because everybody was talking about the information revolution. It was very clear that to have an information revolution, it's not enough to store the information and move it around, you have to find it. I know a lot of people at the time who were talking in those terms--that's going to be the revolution. The ability to find things among huge amounts of information is the key factor. So while nowadays it's completely obvious, even 6 or 7 years ago it was not obvious. I think the reason Google is so successful now is because it was obvious to (co-founders) Larry (Page) and Sergey (Brin) 10 years ago, they put in all the effort, and they're still doing it now.
Don't take that for granted. It was not that well understood, but it was understood by some people. When I started working on search when I was in academia and I said I'm working on search, they looked at me and said, "What do you mean you're working on search? Did you lose something?" In the early 1990s, even, very few people worked on search, because search was done by professionals in various limited domains. There was legal search, there was medical search, there was chemical search, and some limited news search. And it was done by a searcher--professional people. You tell them, "This is what I want to find," and they find it for you. I went to trade conferences with searchers. The idea that people will do the search themselves--that it'll democratize the whole thing and you don't have to go to a professional--that's the revolution.
I think that'll advance much more because you'll do more searches. There are a lot of things you don't search for now, because you don't expect Google will know or that the search engine will find out. We are finding that user expectations grow. The kind of searches people do now are more complicated than the kinds they were doing five years ago. People expect a lot more from us.
Ten years ago, if you actually found an answer to some specific question, it was, "Hey, look at this, it's so cool!" It was an event. Nowadays if you don't find exactly what you want in the first or second result, something is wrong. That's nice. The expectation is that we'll do it.
... Read more
Google has concluded it's been a little too secretive about the inner workings of its search engine.
The company has deliberately stayed mum about the algorithm that decides what to put at the top of the search results list, in part because the company doesn't want competitors copying it and in part because it doesn't want Web sites gaming the system, said Udi Manber, the vice president of engineering in charge of search quality, in a blog post Wednesday. Now, though, it plans to share a little more.
Udi Manber, Google VP, engineering
(Credit: Google)"Being completely secretive isn't ideal, and this blog post is part of a renewed effort to open up a bit more than we have in the past," Manber said. "We will try to periodically tell you about new things, explain old things, give advice, spread news, and engage in conversations."
The blog post mostly just outlines the search quality effort, but we'll have to wait for future blog posts for the real dirt. But Manber gives a glimpse of some of the factors--internally called inputs--that Google weighs.
"The most famous part of our ranking algorithm is PageRank, an algorithm developed by Larry Page and Sergey Brin, who founded Google. PageRank is still in use today, but it is now a part of a much larger system. Other parts include language models (the ability to handle phrases, synonyms, diacritics, spelling mistakes, and so on), query models (it's not just the language, it's how people use it today), time models (some queries are best answered with a 30-minutes old page, and some are better answered with a page that stood the test of time), and personalized models (not all people want the same thing)."
In 2006, Google hired Manber from Amazon.com, where he led the company's A9 search engine work. Before that he worked at Yahoo.
Manber also said humans and automated tools constantly evaluate how well Google is doing, and it constantly rolls changes into the search algorithm--Google made 450 changes to its algorithm in 2007, he said. Some were minor, he said, such as correctly understanding acronyms in Hebrew, and some were major, such as a big change to PageRank in January, he said.
The company also has begun opening up a bit to the press. It shed some light on search challenges during its Google Factory Tour event Monday, and I plan to publish a Q&A with Manber shortly.
If you want to watch a bunch of A-list bloggers and business folks at big-name news sites go a little ape, I recommend observing them when their Google PageRank takes a hit.
According to blogger Andy Beard, a number of high-profile blogs and news sites have had just that happen to them in recent days.
Some examples, according to Beard, include Engadget, which saw its PageRank drop from 7 to 5; Joystiq, from 6 to 4; and SFGate, Forbes.com and WashingtonPost.com, all of which had their PageRank drop from 7 to 5.
What's behind this?
Well, speculation in the blogosphere today has it that Google has decided to punish popular sites that accept paid links to lesser sites. As Valleywag puts it, "Google's bean counter, naturally, would prefer that you pay Google for sponsored links instead."
I'm working on getting comment from Google, but so far no luck. I'll update this post if I do get some comment.
Anyway, part of the buzz about this move is that some of the sites that are taking PageRank hits are the very sites (Search Engine Journal, Copyblogger, Search Engine Guide and the Blog Herald, among them) that cover search engine optimization issues, and some suspect that perhaps the search giant is punishing them for being critics.
Is that possible? Well, who knows?
But as Beard points out, not all the sites that saw PageRank losses engage in the practice of selling paid links. Instead, many of them are part of blog networks that have plenty of internal links between sites. For example, Engadget, which is part of the Weblogs Inc. network.
The guidelines that these sites may have abused? "Don't participate in link schemes designed to increase your site's ranking or PageRank. In particular, avoid links to Web spammers or "bad neighborhoods" on the Web, as your own ranking may be affected adversely by those links."
The real question is, what has changed? It's hard to imagine that all these sites suddenly changed their practices overnight. So for all these sites' PageRank rankings to have changed at once does indicate somebody over at Google is playing with slide rules or something.
Grayboxx is a local recommendations service that's been quietly humming along since 2005. This morning they added 100 cities to the network, bringing the grand total up to 175. Grayboxx takes aggregate customer reviews from all over, and combines them by neighborhood to serve up business recommendations, kind of like what Google has done with its search results. Grayboxx will scour the internet for references to a business (be it tagged photos, or mentions in a blog post), and give that business a certain rank based on its pervasion. However unlike Yelp and Yahoo Local, which are designed and organized to feed off user reviews, Grayboxx's algorithm is completely automatic.
What makes the service particularly interesting is that it's largely unavailable in major U.S. cities right now. For instance, New York, San Francisco, Portland, and Seattle won't be getting the Grayboxx treatment until December, while many smaller towns are fully listed. Grayboxx's CEO has previously mentioned that the reason for this was to avoid head-to-head competition with other services like Yelp, while building up their technologies in smaller markets.
So what kind of stuff do you find doing a search on Grayboxx? For the most part, results are similar to what you'd see on other local search sites. There are addresses, hours of operation, phone numbers and any related Web sites. You also get neighborhood recommendations on the side of every listing, which will tell you if the service has a buzz. What was sharply missing in my testing though, were user reviews of any sort. Grayboxx claims to pull in reviews from third-party sites (like the Yelps and Yahoo Locals of the world), although I couldn't find a single one in my two test cities. While there's space for them on each listing, you can't add your own two cents about the service directly.
I find more often than not that user reviews can be the most helpful part of a business listing when it comes to looking for a recommendation. While services like Yelp and Yahoo Local offer mostly subjective reviews--and widely about food, it's the little things like which food dishes are the best, or important information like times to avoid a place when it's too busy or too quiet. While I don't doubt the interesting new direction Grayboxx is moving towards, I think the user-generated quotient is critical and will remain king.
Don't let the Digg-like counters fool you, those numbers are automatically generated by Grayboxx, the local listing recommendation service.
(Credit: CNET Networks)- prev
- 1
- next





