Much ado about Whitehouse.gov's new openness
Fans of President Barack Obama, or perhaps just those who dislike former President George W. Bush, seem to think there's something notable about the way the new White House Web site is configured to deal with search engines.
That configuration file is called robots.txt. It's designed to let Webmasters ask search engine robots not to include certain areas of a Web site in their index. Well-behaved robots will comply.
The Obama revamp of Whitehouse.gov included a shorter robots.txt file, which Thenextweb.com called "a sign of greater transparency and change." A BoingBoing poster claimed that now "people can find information that was restricted before." And so on.
There's just one problem with these comments. They're wrong. As of Tuesday morning, the Bush administration's robots.txt file did only two things: first, it pointed search engines to the high-graphics versions of the page, as opposed to the text-only versions, and second, it tried to keep type-in-your-search-query pages from being indexed.
Those are legitimate reasons to list those pages in robots.txt, which is why CNET's own file is relatively long and complicated too. (Sites that have been around for eight years or longer tend to get that way). We ask search engines not to index an "/Ads" directory, e-mail-this-story pages, and dozens of others. The Democrat-controlled House and Senate have--gasp!--substantial robots.txt files too.
It's true that in 2007, the Bush White House did block some files they should not have, which they fixed once I brought it to their attention. They also fixed a more serious problem with the Director of National Intelligence's Web site, and an earlier problem in 2003. (A better solution would be for search engines to ignore overly broad robots.txt files on .gov and .mil sites, including Thomas.loc.gov.)
If anything, Obama's robots.txt file is too short. It doesn't currently block search pages, meaning they'll show up on search engines--something that most site operators don't want and which runs afoul of Google's Webmaster guidelines. Those guidelines say: "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."
And here's something sure to upset Obama-praising geeks: the new White House site doesn't pass the litmus test of good HTML design. Alas, according to the W3C, not all pages successfully validate. Those are your tax dollars at work.
P.S.: The White House seems to be using Akamai's Edge Platform for scalable Web hosting:
sh-2.05b$ host whitehouse.gov whitehouse.gov has address 96.6.250.135 whitehouse.gov mail is handled by 105 mailhub-wh3.whitehouse.gov. whitehouse.gov mail is handled by 100 mailhub-wh2.whitehouse.gov. sh-2.05b$ host www.whitehouse.gov www.whitehouse.gov is an alias for www.whitehouse.gov.edgekey.net. www.whitehouse.gov.edgekey.net is an alias for e2561.b.akamaiedge.net. e2561.b.akamaiedge.net has address 96.16.218.135 sh-2.05b$
Declan McCullagh, CNET News' chief political correspondent, chronicles the intersection of politics and technology. He has covered politics, technology, and Washington, D.C., for more than a decade, which has turned him into an iconoclast and a skeptic of anyone who says, "We oughta have a new federal law against this." E-mail Declan. 






So lets hope that the US Government pushes for only allowing tools that fully comply with Section 508 and all those practices that make web sites human readable and machine processable. Italy has cracked down on bad tools and bad sites for government web sites. Here in the US, GSA should enforce the same rules and not allow any government tools that allow for bad practices.
You didn't mention not allowing server specific URLs which is another very bad practice (e.g. pages that expose the .php, .aspx, .asp, .cfm, etc within URLs). Or having XML files without stylesheets for human readability (including RSS feeds). Or moving forward with new standards, like microformats. I hope that this new administration understands the possibilities of having all site human and machine readable, and to understand how this provides the maximum openness.
Daniel Bennett
http://www.advocatehope.org/
You might want to get on those inept morons CNET hired to design their webpages.
This page full of blather and nonsense isn't even valid XHTML transitional, much valid strict.
335 errors, 137 warnings for direct URI testing. Many of those errors are a sign that whoever wrote it gets html 4.1 and xhtml 1 confused. Lots of old attributes that are no longer valid are used.
You need to complain about how your website sucks.
The difference is CNET's websites are old and should have all the errors worked out.
Declan should know better than to throw stones while living in a glass house.
Let's see what whitehouse.gov "developers" do over the next month.
All the author is doing is pointing out people are saying one thing, and in reality don't have a clue as to the merit of what they have to say.
So what if I wouldn't have?
Does that let Declan off the hook?
People are already talking about this, so there is nothing wrong with him searching for the facts on rumors. It is a good thing.
This is not nit picking. People are paid lots of money to go through the same decision process that Declan just laid out.
TV's Mythbusters is built on listening to rumors and finding the facts. Declan just didn't have a film crew.
Thanks for the article.
It is a humorous article, but somehow I don't think that was his goal.
Obama's site, as any casual observer recognizes, has more information than Bush's. Period. They'll no-doubt adjust their crawling strategies to optimize, just like any site administrator does. Bush clearly just stuck out a minimal template robots file then ignored it.
There's no news here except that one can imply Obama has more up-to-date, hands-on engineers running the White House website. Hopefully we won't see "shocking" news stories for the next four years that read something like "White House operatives using government computers to receive email lists of groceries from their spouses to pick up on their drive home!"
But no... It turns out that Declan has another bur under his saddle about Robots.txt. Not about a real story like a true comparison of the information that the old and new administrations have posted.
What's worse is the tease in the CNET Morning News Dispatch email, which led me here:
"President Obama's new White House Web site has been lauded for being more open than former President Bush's. There's just one problem with that theory: it's wrong"
So, naturally, I wanted to know what they meant, thinking that CNET was accusing the Obama administration of hiding documents, or limiting the amount of information that they put up on the site, or obfuscating and stonewalling.
There's only one problem with the tease, and with the lede of the story: They're deceptive.
The underlying problem is that Declan and/or his editors have decided to use teasers and ledes to pump up a tiny nit-pick over robots.txt into something that sounds like an allegation of fascist control of the truth. Tabloid-style journalism [sic] won't help CNET's credibility, and in the tech community -- rife with skeptics and those interested in actual facts -- it will cost you readers.
If there's a life to be gotten, Declan and CNET clearly need to be pointed in its direction.
Declan and CNET should be ashamed.
Just visit http://whitehouse.gov. Every American should be proud of this site.
CNET, I think its time to get with the times and refresh your political correspondent.
One thing I worry about is the American people are expecting Obama to change the World. I admire the goals he has put forward and I think he is going to do great things but the Americans have to have realistic expectations of what can be done.
But Al Gore started the rumor about inventing the internet, not any reporter. I've seen the video of him saying it, and I find it hard to believe anyone posting here hasn't seen it--and he didn't say he'd funded it, he said he and a colleague "invented a little thing called the internet."
He took credit for pushing funding for it.
I don't know if Declan was the one responsible for pushing the lie, but it was someone in the media.
Similar improvements were made to the Whitehouse switchboard early in the Clinton administration, but the republicans hired people to use modems to call in constantly and ruined it for everyone. Since the slush funds have since dried up for that sort of petty harassment (dirty tricks they were called), we can hope for a better government mechanism where more than the powerful can get their views expressed.
i just dont think there's one perfect website
check out what the W3C has to say about CNET.COM and other websites.
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.CNET.COM&charset=(detect+automatically)&doctype=Inline&group=0&user-agent=W3C_Validator%2F1.606
MICROSOFT.COM
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.MICROSOFT.com%2F&charset=(detect+automatically)&doctype=Inline&group=0&user-agent=W3C_Validator%2F1.606
MIT.EDU
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.mit.edu&charset=(detect+automatically)&doctype=Inline&group=0&user-agent=W3C_Validator%2F1.606
There are web designers who can create dynamic pages validate 100% every time. It really isn't that hard and should be standard, but too many "web developers" aren't qualified.
IMO, a browser should put up an error page whenever it gets a poorly written html file. Quirks mode is the worst idea in tech ever. It encourages and excuses sloppy work.
Imagine if a compiler or interpreter just guessed at what was meant when it comes across a syntax error. You think software is bad now?
- by tkarmadragon January 21, 2009 1:04 PM PST
- Somebody was comparing a shorter robots.txt file to greater political transparency? Whaaat?!
- Like this Reply to this comment
-
-
- by pentest January 21, 2009 7:27 PM PST
- You have it backwards.
- Like this
-
Showing 1 of 2 pages (46 Comments)Man, the Obama fanaticism is insane. Even Jesus is starting to feel jealous.
Declan seems to be implying that transparency is worse today then last week, based purely on the size of a text file.
I do agree with your sentiment: "Whaaat?!"