White House says blocking Iraq Web documents was 'mistake'
The Bush administration says that blocking search engines from indexing key Iraq-related documents on its White House Web site was a simple mistake.
Until Thursday, the White House was using a robots.txt file that instructed search engines not to visit publicly accessible Iraq files on Whitehouse.gov, including a January strategy report (PDF) and a July benchmark report (PDF).

This public report on Iraq was marked as off-limits to search engines by Whitehouse.gov through the robots.txt file.
In response to phone conversations I had with them pointing out the problem, they've since revised their robots.txt file--meaning the progress report on Iraq due next week should be visible through Google, MSN and so on.
"It was not intentional, and we have corrected the mistake," White House spokesman Blair Jones told me.
I've put the pre-Thursday version of their robots.txt file online here so you can see for yourself.
The other odd thing I noticed is that Whitehouse.gov was programmed to block search engines from indexing a photo gallery of President Bush in a flight suit standing in front of that famous Iraq "Mission Accomplished" banner in May 2003.
What's odd is that the gallery, which has since been moved, was the only one on the entire Whitehouse.gov site listed as off-limits. To be fair, though, the current location is not off-limits.
By way of background, there was a flap in late 2003 about the White House using robots.txt to tell search engine bots to stay away from "/iraq" pages because the same file was posted in the main section and duplicated in the "/iraq" section. It's the same logic as blocking text-only pages; here's an example of the same text appearing in three different templates: normal, text-only, and printer-friendly. The White House seems to have subsequently discontinued the Iraq template.
That explains the "/nsc/iraq" directory being marked as off-limits to search engines. But out of 767 mentions of "/iraq" in the robots.txt file from 2003, the sole Iraq press release or gallery listed as blocked this week (a) represents a uniquely embarrassing moment for the Bush administration and (b) has been the subject of revisionism.
Don't believe me? Bush's carrier speech originally was titled, according to the Internet Archive, "President Bush Announces Combat Operations in Iraq Have Ended" and featured photographs of smiling Iraqi children. At some point the children vanished and the speech was quietly renamed: "President Bush Announces Major Combat Operations in Iraq Have Ended." Another USS Abraham Lincoln-related switch: before and after.
Jones said that robots.txt entry, too, was just a coincidence. He told me: "We reorganized our Iraq content into one 'In Focus' area and the Web team inadvertently missed these folders in the robots.txt file."
I should point out that the White House is not the only federal Web site to have an overzealous robots.txt file. For no good reason that I can discern, National Intelligence Director Mike McConnell blocks search engines from his entire organization's Web site. In this case and Whitehouse.gov, it's time for a friendly amendment to the Robots Exclusion Protocol: Search engines should ignore robots.txt when a government agency uses it, intentionally or unintentionally, to keep public documents away from the public.
P.S.: Here's what I did to see which directories of interest are listed as off-limits to search engines (you'll have to replace the URL to robots.txt with the archived one I linked to above):
sh-2.05a$ wget http://whitehouse.gov/robots.txt -O wh.txt -o log.txt sh-2.05a$ grep -v text wh.txt User-agent: * Disallow: /cgi-bin Disallow: /search Disallow: /query.html Disallow: /help Disallow: /news/releases/2003/05/images/iraq Disallow: /news/releases/iraq Disallow: /nsc/iraq User-agent: whsearch Disallow: /cgi-bin Disallow: /search Disallow: /query.html Disallow: /help Disallow: /sitemap.html Disallow: /privacy.html Disallow: /accessibility.html sh-2.05a$
Declan McCullagh, CNET News' chief political correspondent, chronicles the intersection of politics and technology. He has covered politics, technology, and Washington, D.C., for more than a decade, which has turned him into an iconoclast and a skeptic of anyone who says, "We oughta have a new federal law against this." E-mail Declan.




Rather than some conspiracy, the most likely explanation is that the White House web guys want to minimize deep searches, as they're moving stuff around. It's not like anyone is going to have trouble finding the Iraq report all over the web.
There isn't much of a good reason to withhold public infomation regarding government off the search engines except to limit knowledge.
This does mean the White House specifically and intentional wanted to make sure these items were not searchable.
That'll send CNet/MoveOn.org/Daily KoOKs off to the funny farm.
For an organization as hyper-conscious of appearances as the White House, there's no such thing as a mistake.
My evaluation of this situation is that they want to have the material on the site to be able to say they are totally open about the information, yet the 'blocking' of search engine indexing means they can control and prevent causal access by the majority of web users who do not decompose sites to drill down into whatever may be stored there.
The Bush Administration is even worse than the Clinton Administration was when it comes to covering up potentially embarassing information.
The Gov. is under no obligation to make it easy for you to search their websites. There are a multitude of reasons to block search engines from digging deep into their websites. But the Whitehouse website apparently doesn't mind people searching it.
You seemed to miss on the Whitehouse website on the top right where there is this lovely blank field and next to it is a button. I think it's called search? Yes that's it. So the website already has a search engine. What's the problem? I found that January 2007 Iraq strategy report using their search and it was the second result in the list.
And asking that search engines pick and choose whose robots.txt file to ignore will just open them up to lawsuits so thankfully they'll ignore this ridiculous request anyway.
- Mistake is Bush was caught in yet another lie
-
by likes2comment
September 7, 2007 1:35 PM PDT
- yeah, the mistake is that the whitehouse was caught in yet another lie and clumbsy coverup attempt.
-
Reply to this comment
-
(9 Comments)