• On TechRepublic: Why VISTA HATERS will love Windows 7
September 7, 2007 4:45 AM PDT

White House says blocking Iraq Web documents was 'mistake'

by Declan McCullagh

The Bush administration says that blocking search engines from indexing key Iraq-related documents on its White House Web site was a simple mistake.

Until Thursday, the White House was using a robots.txt file that instructed search engines not to visit publicly accessible Iraq files on Whitehouse.gov, including a January strategy report (PDF) and a July benchmark report (PDF).

This public report on Iraq was marked as off-limits to search engines by Whitehouse.gov through the robots.txt file.

In response to phone conversations I had with them pointing out the problem, they've since revised their robots.txt file--meaning the progress report on Iraq due next week should be visible through Google, MSN and so on.

"It was not intentional, and we have corrected the mistake," White House spokesman Blair Jones told me.

I've put the pre-Thursday version of their robots.txt file online here so you can see for yourself.

The other odd thing I noticed is that Whitehouse.gov was programmed to block search engines from indexing a photo gallery of President Bush in a flight suit standing in front of that famous Iraq "Mission Accomplished" banner in May 2003.

What's odd is that the gallery, which has since been moved, was the only one on the entire Whitehouse.gov site listed as off-limits. To be fair, though, the current location is not off-limits.

By way of background, there was a flap in late 2003 about the White House using robots.txt to tell search engine bots to stay away from "/iraq" pages because the same file was posted in the main section and duplicated in the "/iraq" section. It's the same logic as blocking text-only pages; here's an example of the same text appearing in three different templates: normal, text-only, and printer-friendly. The White House seems to have subsequently discontinued the Iraq template.

That explains the "/nsc/iraq" directory being marked as off-limits to search engines. But out of 767 mentions of "/iraq" in the robots.txt file from 2003, the sole Iraq press release or gallery listed as blocked this week (a) represents a uniquely embarrassing moment for the Bush administration and (b) has been the subject of revisionism.

Don't believe me? Bush's carrier speech originally was titled, according to the Internet Archive, "President Bush Announces Combat Operations in Iraq Have Ended" and featured photographs of smiling Iraqi children. At some point the children vanished and the speech was quietly renamed: "President Bush Announces Major Combat Operations in Iraq Have Ended." Another USS Abraham Lincoln-related switch: before and after.

Jones said that robots.txt entry, too, was just a coincidence. He told me: "We reorganized our Iraq content into one 'In Focus' area and the Web team inadvertently missed these folders in the robots.txt file."

I should point out that the White House is not the only federal Web site to have an overzealous robots.txt file. For no good reason that I can discern, National Intelligence Director Mike McConnell blocks search engines from his entire organization's Web site. In this case and Whitehouse.gov, it's time for a friendly amendment to the Robots Exclusion Protocol: Search engines should ignore robots.txt when a government agency uses it, intentionally or unintentionally, to keep public documents away from the public.

P.S.: Here's what I did to see which directories of interest are listed as off-limits to search engines (you'll have to replace the URL to robots.txt with the archived one I linked to above):

sh-2.05a$ wget http://whitehouse.gov/robots.txt -O wh.txt -o log.txt
sh-2.05a$ grep -v text wh.txt
User-agent:     *
Disallow:       /cgi-bin
Disallow:       /search
Disallow:       /query.html
Disallow:       /help
Disallow:       /news/releases/2003/05/images/iraq
Disallow:       /news/releases/iraq
Disallow:       /nsc/iraq

User-agent:     whsearch
Disallow:       /cgi-bin
Disallow:       /search
Disallow:       /query.html
Disallow:       /help
Disallow:       /sitemap.html
Disallow:       /privacy.html
Disallow:       /accessibility.html
sh-2.05a$ 

Declan McCullagh, CNET News' chief political correspondent, chronicles the intersection of politics and technology. He has covered politics, technology, and Washington, D.C., for more than a decade, which has turned him into an iconoclast and a skeptic of anyone who says, "We oughta have a new federal law against this." E-mail Declan.
Recent posts from Politics and Law
Report: Microsoft, EU in talks over antitrust issues
Report: Guilty verdict overturned in MySpace suicide case
Court: MySpace not liable for offline assaults
New dashboard shows where federal IT tax dollars go
China delays rule for Net-screening software
Amazon positioned to win state tax battle
NY mayor: Info to the people will improve gov't
E-mails indicate EPA suppressed report skeptical of global warming
Add a Comment (Log in or register) (9 Comments)
  • prev
  • 1
  • next
Misleading headline, silly article
by TPHB2 September 7, 2007 6:22 AM PDT
Robots.txt does not "block" anything. It is a request that search spiders not index from that page root. Well-behaved search engines like Google/MSN honor it, but it's not like the documents are hidden.

Rather than some conspiracy, the most likely explanation is that the White House web guys want to minimize deep searches, as they're moving stuff around. It's not like anyone is going to have trouble finding the Iraq report all over the web.
Reply to this comment
Yeah sure
by vhac September 7, 2007 7:15 AM PDT
most people would go to whitehouse.gov and click through every links instead of using search engine *sarcastic*. It's not about Iraq's report, it's about the VERSION of Bush's Iraq report. Wait wait, aren't the terrorists on their last breath and deathbed according to Dick Cheney and Bush?. or is it about clear and imminent danger from Saddam and Weapon of Mass Disappearance?
There isn't much of a good reason to withhold public infomation regarding government off the search engines except to limit knowledge.
Still
by umbrae September 7, 2007 7:17 AM PDT
Since Google and other search engines are the main way to find content, hiding these from search engine is a significant and "intentional" act.

This does mean the White House specifically and intentional wanted to make sure these items were not searchable.
More BDS from CNet
by fafafooey September 7, 2007 6:23 AM PDT
More Bush Derangement Syndrome (BDS) from CNet.
Reply to this comment
Next we'll read...
by fafafooey September 7, 2007 6:27 AM PDT
that he snooped around some more and found that the last person who edited the file was user "krove".

That'll send CNet/MoveOn.org/Daily KoOKs off to the funny farm.
Right, Attack The Messenger
by Stating September 7, 2007 11:18 AM PDT
You're using the standard progaganda technique of trying to discredit the messenger (CNET) to take attention away from the message. Go back inside your Limbaughian bunker.
Mistake, yeah, I' ve heard that before
by Dr_Zinj September 7, 2007 8:31 AM PDT
The title of the article is neither "Misleading", nor is the article itself "silly".

For an organization as hyper-conscious of appearances as the White House, there's no such thing as a mistake.

My evaluation of this situation is that they want to have the material on the site to be able to say they are totally open about the information, yet the 'blocking' of search engine indexing means they can control and prevent causal access by the majority of web users who do not decompose sites to drill down into whatever may be stored there.

The Bush Administration is even worse than the Clinton Administration was when it comes to covering up potentially embarassing information.
Reply to this comment
Gov. Under no obligation to make your life easier
by ballssalty September 7, 2007 9:06 AM PDT
Again I have to disagree with Declan. I don't do it often but on this issue I think he's getting obsessive.

The Gov. is under no obligation to make it easy for you to search their websites. There are a multitude of reasons to block search engines from digging deep into their websites. But the Whitehouse website apparently doesn't mind people searching it.

You seemed to miss on the Whitehouse website on the top right where there is this lovely blank field and next to it is a button. I think it's called search? Yes that's it. So the website already has a search engine. What's the problem? I found that January 2007 Iraq strategy report using their search and it was the second result in the list.

And asking that search engines pick and choose whose robots.txt file to ignore will just open them up to lawsuits so thankfully they'll ignore this ridiculous request anyway.
Reply to this comment
Mistake is Bush was caught in yet another lie
by likes2comment September 7, 2007 1:35 PM PDT
yeah, the mistake is that the whitehouse was caught in yet another lie and clumbsy coverup attempt.
Reply to this comment
(9 Comments)
  • prev
  • 1
  • next
advertisement

With Chrome, Google reignites the OS wars

roundup Google Chrome OS, due in 2010, underscores the Web giant's cloud-computing ambitions and opens new competition with Microsoft.
• What Chrome OS has on Windows that Linux doesn't

Laying a guilt trip on military robots

q&a Georgia Tech's Ronald Arkin aims to configure armed robots with a built-in "guilt system" to help them avoid civilian casualties.

About Politics and Law

News at the intersection of technology, politics, and law, ranging from intellectual property to censorship to tech policy.

Add this feed to your online news reader

Politics and Law topics

advertisement
advertisement

Inside CNET News

Scroll Left Scroll Right