Google's Webmaster Central has become a very important resource for anyone who has a Web site, works on a Web site, or, like SEO practitioners, helps others with their Web sites.
Google continues to roll out more features and better functionality to existing features, and now they just did a little bit of both with the addition of their Generate robots.txt function.
Google had previously added a robots.txt analyzer, which at this point is still the more useful of the two tools. For those who aren't aware, the robots exclusion protocol helps with instructing search engines how to interact with a Web site. There are a number of directives available, but the main purpose of the robots.txt file is to instruct the search engines about content that a site owner doesn't want the robots to crawl.
Why in the world would you not want search engines to crawl any of your content? You may have content that, for whatever reason, you don't want others to find through search results. Note, however, that this is not the same as secure information that requires authentication through a log-in.
Your site may have its own search function that creates "search results" for your site. Search engines generally do not want to include search results within search results, so this content may not be returned for searches on the engines anyway, so you might want to focus the crawlers elsewhere for greater crawler efficiency.
Or you may have duplicate content issues that you could use robots.txt to filter out. This is especially common with a content management system (CMS) that creates a separate printer-friendly page.
Regardless of your specific needs, having a robots.txt file can be important to a site. Rarely is there a site that can't benefit from disallowing at least some content. Even if you have nothing to disallow, you may want to take advantage of the auto-discovery feature for your XML sitemap. Finally, depending on your server log system or analytics package, not having a robots.txt file can be problematic if it inflates your "404 File Not Found" error reporting, which can happen because search engine spiders will request the robots.txt file automatically when they come to your site.
Right now, the robots.txt generator is rather basic and I hope that Google will add more features to it going forward. Currently, site owners have to paste in URLs and URL patterns to build the file. It would be great if it would provide a list of URLs or patterns extracted from a site to help automate the procedure for anyone not familiar with the protocol.
There is more information about the protocol, though a bit more on the technical side, at the robotstxt.org site and you can find more engine specific information on crawling and robots.txt from Google, Yahoo, MSN, and Ask.com.
One important tip is that the following directive tells all spiders they are allowed to go anywhere:
User-agent: *
Disallow:
And, more importantly, the following directive, which I sometimes see when I think people really wanted the above:
User-agent: *
Disallow: /
The latter tells the spiders to stay out of the entire site--clearly two very different results, so be sure you understand which does what.
Web site owners and SEOs alike often feel at odds with the search engines, but times are changing. This was often the case in the past when the engines made updates and changes to their algorithms that seemed to send Web sites into a SERP tailspin, leaving everyone scrambling to regain their precious page-one positions. The engines were also a lot less forthcoming with information and guidance, perhaps taking the view that giving this information gave too much power to the spammers and phishers.
While this view was understandable on the surface, it didn't float all that well in reality. In the real world, those who are out to game and manipulate the engines may have as many or more resources to keep up with the engines than "the rest of us." So over the last few years, the search engines have continued to be more open with what they consider important as well as what abuses may get sites into trouble, perhaps realizing that there are also a lot of sites that may not have been purposely trying to mislead the engines, but were just victims of bad advice. And of course the algorithms have become far more powerful and fine-tuned than they once were.
By openly helping everyone, they are really just helping to raise the bar of quality for all sites, and maybe even making it even harder for bad sites to game the engines. Along with providing more detailed information and answering more and more questions publicly, the greatest advancement they have made has been in creating tools to actually give site owners (who have validated their sites) more information about their sites than they've ever experienced before.
Webmaster Central and Webmaster Tools
Google introduced Webmaster Central, which continues to add more and more features for site owners. Not surprising, Webmaster Central is leading the pack in delivering great information and tools to Webmasters. At the center, literally, is Webmaster Tools, which provides site owners with fairly detailed information on site crawling, queries, considerably more backlink information than can be queried outside of Tools, and much more. The query information in particular provides an unprecedented view of the search phrases that a site is showing up for, including those terms that aren't actually delivering traffic to the site.
Site Explorer
Yahoo's Site Explorer is still lacking in a few areas compared with Google's Webmaster Tools, but they almost make up for that with their powerful link information. Through simple drop-down menus, it is quick and easy to tailor results based on links to a specific page or the entire site, to include all links or to exclude links from the site and focus on external links only. Yahoo added a new feature that may give even more control to site owners. The Dynamic URLs tab gives site owners the ability to inform Yahoo of their site's dynamic URL patterns to help eliminate duplicate content issues, better handling of multiparameter URLs, addressing session IDs, and even presenting "cleaner" URLs in search results. Ideally, it would be best to address as many of these issues through rewrites and the robots.txt file on the server, but this is a great addition as a backup or for when that isn't possible.
Webmaster Portal
Trying not to be outdone, the Live Search team at MSN recently announced their entry into the mix with the Webmaster Portal, currently in beta and by request only. Little detailed information is available, but their tool also is claiming to help troubleshoot crawl issues, assist with sitemaps, and provide site statistics, including a replacement to the "link:" operator query that was decommissioned back in March. The portal is slated to be fully available to the public by late fall, but it may be worthwhile to request an invite to participate in the beta now.
The advancement in all of these tools is great news to Webmasters and SEOs alike. They continue to put more information and control into our hands. Not wanting to be outdone by the others, hopefully each of the engines will add each other's additions to their own toolsets. As each of these is free, there is no reason for site owners not to take a few minutes to validate their sites and start spending a little time each month putting these tools to work for them. This is one invitation from the spiders you don't want to turn down.
Maile Ohye of Google
(Credit: Maile Ohye)The recently revised and expanded Google Webmaster Guidelines was a hot topic in this interview. Many webmasters have just enough knowledge of SEO to be dangerous, and the expanded Guidelines will hopefully help them avoid shooting themselves in the foot. Maile commented that sometimes webmasters (and even some SEO firms) may not know that what they are doing is against Google's guidelines, which is why the feedback tools are so important for webmasters to get a better understanding of what Googlebot wants to see.
Some highlights from my interview with Maile include:
- Cloaking: Sometimes even SEO firms don't "get" what cloaking actually is. Maile advised playing it safe and staying away from the "grey" area, and remember to serve the same content to both users and search engines.
- Session IDs: If session IDs in your URLs are dragging down your rankings, Maile suggested that you could throw the session variable into a cookie and utilize some 301 redirects to ensure that your customer's experience is not interrupted while optimizing search engine visibility.
- Flash: I had asked Maile about Flash, AJAX, and other "eye candy", because so many brand-centric retailers and manufacturers rely heavily on such technologies on their sites, and such approaches aren't typically search-friendly. Maile stated that the best option if you want to use Flash is to incorporate it as a complement to your text-based website. Even though some brands like to offer a Flash and a non-Flash site, Maile said that this approach is confusing to your customers and it's not ideal from an SEO standpoint. Your PageRank could be too spread too thin across both sites, so you end up diluting your link importance to the point that neither version has the opportunity to rank well. Furthermore, a Flash version that isn't bookmarkable actually discourages deep linking. "Progressive enhancement," which relies on "noscript" tags, is an approach to making Flash more search engine friendly. Maile confirmed that Google looks at the content within "noscript" tags, but be careful to mirror accurately the Flash-based content you include within the noscript tags or it will look like cloaking to Googlebot.
- Sitemaps: Maile was enthusiastic about sitemaps as a powerful addition to your SEO arsenal. She advised that when using them, be sure to use the canonical URLs, scrubbed clean of superfluous variables. Sitemaps.org is a great resource if you have any questions, and it's something pretty amazing. MSN, Yahoo! and Google have all agreed to utilize the XML Sitemaps Protocol that Google pioneered.
- Paid Links: Maile reaffirmed Matt Cutt's previously stated position that Google is against buying paid links for PageRank. She said that the reason why Google took its stance against "paid links" is because it is deceptive to users -- making it about the size of wallet rather than about merit.
Not only was Maile very accommodating to many of my tough questions, she also went a step further and took some notes on questions she felt Google needed to address. I had asked about Google's take on the appropriate uses for the rel=nofollow "link condom", as well as more targeted questions toward large ecommerce sites. She mentioned that more updates will be coming from Google soon, based on feedback she has received for their new guidelines.
After the interview, I caught up again with Maile at BlogHer in Chicago. With Google's aggressive conference schedule, they are actively trying to get more face time with webmasters and SEOs to answer any questions they might have, and to network within the SEO community. Kudos to Google and to Maile for that! Hopefully Maile gets enough breaks from all the traveling to recharge her batteries and to enjoy all the perks offered "on campus" at the Googleplex (like unlimited snacks!).
In chairing the AMA's Hot Topic: Search Engine Marketing conference on Friday, I had the pleasure of hearing the presentation of Trevor Foucher, Software Engineer at Google. Not only did Trevor explain the value of Google Webmaster Central as well as some webmaster do's and don'ts, he also gave us attendees an inside glimpse of Google.com -- through statistics provided in Google Webmaster Central to verified sites, and Google.com was a verified site! So we got to see, for example, that Webmaster Central reports 19,818,383 total links to www.google.com and 1,432,828 links to the www.google.com home page. And that the top search queries for the Google.com site, on average over the last 3 weeks, were: google, translate, translation, froogle, google translate, "www google", finance, google finance, google talk, calendar, youtube, google analytics. And that the most popular words used in external links pointing to Google.com were: google, help, jobs, google web directory, ads by google, google bookmarks, aggiungi su google.
What I found most fascinating in all the reports was Googlebot's error reporting for the Google.com site. The following URLs made it to the top of the report, all with 404 errors:
http://www.google.com/%20
http://www.google.com/%20%20
http://www.google.com/%20%20%20
Apparently there are a large number of sites incorrectly linking to Google with an extra space or two (or three) after the /. I pointed out to Trevor that Google should 301 redirect those %20 URLs to the Google.com home page to funnel the PageRank to the home page. That will mean Google will be able to pass more PageRank down into deeper pages of their site.
Trevor's response was that Google isn't concerned about SEO'ing their site. I can see his point. But still, it's a missed opportunity, because most of their site doesn't have much PageRank. Webmaster Central showed the PageRank breakdown of the pages of Google.com (roughly) as follows:
I
III
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
IIIII
where the first row is High, second row is Medium, third row is Low, and fourth is "PageRank not yet assigned".
Google's mantra in response to all SEO questions is "Do what's good for the user." In this case, 301 redirecting is good for the user. Because look at the horrible user experience if I clicked on a link leading to http://www.google.com/%20 -- click here. Yuch! There's nowhere to even go.
- prev
- 1
- next




