• On CBSSports.com: Mike Tyson's daughter dies in accident
September 7, 2007 4:30 PM PDT

National Intelligence Web site no longer invisible to search engines

by Declan McCullagh

Until a few hours ago, the Web site of National Intelligence Director Mike McConnell had been invisible in Google, MSN and Yahoo searches. That's because dni.gov's robots.txt file told search engines to stay away.

Now it's been fixed. DNI spokesman Ross Feinstein told me, apologetically, a moment ago: "When we saw your story posted, I asked our developers to look into it...We certainly appreciate you bringing it to our attention. It's a public Web site. We want it to be indexed. We're not even sure how (the robots.txt file) got there."

National Intelligence Director Mike McConnell's official public Web site had been invisible to search engines. After an article appeared here at News.com, he fixed the problem.

The robots.txt file can't force search engines to ignore certain Web sites or sections of Web sites, but most indexing bots will abide by the requests. When dealing with government sites, this is a mistake, but more on this below.

By way of background, I wrote a blog on August 24 pointing out the invisible dni.gov Web site (and a handful of other .gov and .mil sites). Then I wrote a follow-up this morning about the White House's Web site blocking Iraq documents via robots.txt, and then lifting the ban after we spoke on the phone this week.

DNI spokesman Feinstein said that the robots.txt file had initially been fixed on Monday but then when the site was updated on Tuesday with a media advisory, the prohibitory original version of robots.txt had been restored. Now it's presumably permanently fixed.

Now, I'm the last person to suggest that using robots.txt to cordon off subsets of your Web site is somehow evil. At CNET News.com, we use it to tell search engines not to index our "e-mail story" pages, for instance, and on my own personal Web site I use it as well. Blocking misbehaving Web crawlers is important and necessary.

But why should a public federal Web site be entirely marked as off-limits to search engines? There's no good reason. I can think of two bad reasons: (a) avoiding the situation of posting a report that turned out to be embarrassing and was cached by Google and Archive.org and (b) letting the feds modify a file such as a transcript without anyone noticing. (The White House has quietly altered photo captions before, and I've documented how a transcript of a public meeting was surreptitiously deleted--and then restored.)

I don't know why DNI chose to want to be invisible in searches. Their explanation of a simple mistake, like the one the White House gave me earlier this week, is certainly plausible. But this is why, I'll say once again, we need a modest revision to the Robots Exclusion Protocol: Search engines should ignore robots.txt when a government agency is using it to keep public documents hidden from the public.

Declan McCullagh, CNET News' chief political correspondent, chronicles the intersection of politics and technology. He has covered politics, technology, and Washington, D.C., for more than a decade, which has turned him into an iconoclast and a skeptic of anyone who says, "We oughta have a new federal law against this." E-mail Declan.
Recent posts from Politics and Law
Report: Microsoft, EU in talks over antitrust issues
Report: Guilty verdict overturned in MySpace suicide case
Court: MySpace not liable for offline assaults
New dashboard shows where federal IT tax dollars go
China delays rule for Net-screening software
Amazon positioned to win state tax battle
NY mayor: Info to the people will improve gov't
E-mails indicate EPA suppressed report skeptical of global warming
Add a Comment (Log in or register) (7 Comments)
  • prev
  • 1
  • next
No no change for robots needed
by walirick September 7, 2007 7:47 PM PDT
What is needed is for journalist to keep doing their job really doing their job and not waiting for yahoo or google to do it for them keep the public informed Keep the government aware that the only reason it is there is by our discretion and yes to trounce any official that want to remove any right or freedom.
Reply to this comment
Why...
by nateman_99 September 8, 2007 10:25 AM PDT
Why can't we have both?
It's Risky To Your Health
by Stating September 8, 2007 4:50 PM PDT
If journalists get too close to the truth their either lose their job or their life or both. That's why drilldown to the truth always stops at some point.
Why Would $2.3 Trillion Go Missing?
by Stating September 8, 2007 11:06 AM PDT
Why were robots.txt used, but more importantly why would $2.3 trillion go missing from the DoD? Nobody seems interested in the missing money. That's a far bigger story.

http://www.cbsnews.com/stories/2002/01/29/eveningnews/main325985.shtml
"More money for the Pentagon, CBS News Correspondent Vince Gonzales reports, while its own auditors admit the military cannot account for 25 percent of what it spends.

"According to some estimates we cannot track $2.3 trillion in transactions," Rumsfeld admitted."
Reply to this comment
Oh.. Sorry..
by 8ball629 September 8, 2007 12:46 PM PDT
I bought some personal stuff with that money. Think of it as a loan.
Wrong
by Busboy2 September 8, 2007 1:02 PM PDT
That article is totaly wrong the entire US government only has a budget of 2.66 trillion
View reply
(7 Comments)
  • prev
  • 1
  • next
advertisement

With Chrome, Google reignites the OS wars

roundup Google Chrome OS, due in 2010, underscores the Web giant's cloud-computing ambitions and opens new competition with Microsoft.
• What Chrome OS has on Windows that Linux doesn't

Laying a guilt trip on military robots

q&a Georgia Tech's Ronald Arkin aims to configure armed robots with a built-in "guilt system" to help them avoid civilian casualties.

About Politics and Law

News at the intersection of technology, politics, and law, ranging from intellectual property to censorship to tech policy.

Add this feed to your online news reader

Politics and Law topics

advertisement
advertisement

Inside CNET News

Scroll Left Scroll Right