White House expands use of search-blocking code
The White House has silently tripled the number of Web pages that it forbids Google and other search engines from accessing. Is this a bad omen or much ado about nothing?
Within hours of Barack Obama being sworn in as president, bloggers and tech journalists began to closely examine the new White House Web site for hidden indicators as to how he would shape future tech policy.
While I focused my efforts on the White House privacy policy, others looked to the new administration's robots.txt file, which lays out boundaries that search engines like Google should follow when scraping the site.
When the new Obama geek team posted its sparse robots.txt to the Web, tech pundits soon hailed it as a sign of the President's commitment to openness, transparency, and proof that someone tech-savvy was finally running the show.
Blogger Jason Kottke hailed the move, writing that it was "a small and nerdy measure of the huge change in the executive branch of the U.S. government today." Another blogger, Ben Orenstein, compared the new Obama robots.txt file to the 2,400-line file used by the Bush White House, "I think you've got a lovely little microcosm; one that points to a hopeful and open future."
The big fuss?
These digerati were excited by the fact that the new White House robots.txt file contained just two lines:
User-agent: *
Disallow: /includes/
Fast-forward one week, and the White House has silently started to expand its use of the robots.txt search engine-blocking mechanism. As of Friday morning, the file now contains the following text:
User-agent: *
Disallow: /includes/
Disallow: /search/
Disallow: /omb/search/
While it would be accurate to state that the White House has in one day tripled the number of sites it excludes from Google crawling, it is also important to note that this is not a big deal--in fact, it doesn't matter at all.
For the most part, the Bush White House's use of robots.txt was totally legitimate, something that Kevin Fox, an engineer at Friendfeed told the folks at Google Blogoscoped:
This is a bit silly. The old robots.txt excludes internal search result pages and redundant text versions of HTML pages. This is exactly what robots.txt is for. Google's Webmaster Guidelines state "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."
It's understandable that the robots.txt of an 8-year-old site is longer than that of a 1-day-old site, and it's not as if '/secrets/top' or '/katrina/response/' were put in the robots file.
Fun as it may be, this is a nonstory.
Those bloggers drunk on hope who desperately wanted to see proof of Obama's commitment to his campaign promises of transparency and Google Government now find themselves with a difficult choice: they can either accept and acknowledge that robots.txt files are not a set of digital tea leaves through which you can read the new administration, or, if robots.txt does carry weight, they can try to come up with a way of explaining a 200 percent increase in the number of directories blocked by Obama's Web team as anything but Cheney-esque secrecy.
Simply put, the robots.txt file was created and managed by engineers, not lawyers or policy makers. It is not the place to judge the president on tech policy issues.
The president's tech policy should instead be judged on real issues: how many former RIAA and MPAA lawyers will be given positions of power in the administration, who ends up working at the FTC and FCC, and who will be named the new cybersecurity czar.
As for the president's commitment to transparency, he has already violated his pledge to post all nonemergency bills on the Whitehouse.gov Web site for five days before signing them. The text of the Lilly Ledbetter Fair Pay Act of 2009, which was signed into law yesterday, was certainly not posted to Whitehouse.gov for anywhere near five days.
Obama's broken commitment to transparency remains advertised on the White House blog:
One significant addition to WhiteHouse.gov reflects a campaign promise from the president: we will publish all nonemergency legislation to the Web site for five days, and allow the public to review and comment before the president signs it.
It is by looking to these kinds of concrete issues by which we can judge the president, not robots.txt
Christopher Soghoian delves into the areas of security, privacy, technology policy and cyber-law. He is a student fellow at Harvard University's Berkman Center for Internet and Society , and is a PhD candidate at Indiana University's School of Informatics. His academic work and contact information can be found by visiting www.dubfire.net/chris/. He is a member of the CNET Blog Network, and is not an employee of CNET. Disclosure. 



You said it. Its not even important enough for an article of this length. Adding something to the robot.txt file in no way makes information unavailable. Many search engine even ignore it. This is only to instruct search engine to not include information that may not display properly or be duplicated. This looks like they are telling engines to ignore their search results, which is a proper use of robot.txt and not anything to do with censorship or openness. A robot.txt file does not hide information; nor is it very effective in blocking search engines.
And don't forget his outright reversal on the telecom immunity, on the record with CNet.
Big surprise, Obama's a liar. He is a career politician, and his mouth moves a lot.
I actually considered voting for him briefly, until he picked the Neo-Cheney VP Biden.
McCain was the worse choice, obviously, but it looks to be shaping up for a continual stream of disappointment from the first mixed-race president. Our political system is growing more obviously worthless, when real choices are so few. The only solution is for a majority of voters to wake up and move past the 2-party system - it fails miserably in the 21st Century. I will never vote either Republican or Democrat.
Here's my take on things: this is a new administration running the show. Things like how long legislation is posted on the Web site vs. when it's signed into law & what robot.txt files are listed or aren't listed: all this kind of stuff takes a while to settle before it starts running smoothly.
After all, how efficient is *your* office? I can say at my company, I get information from various different sources with conflicting instructions & expectations of when things will be done by.
Yes, yes, I guess the White House should be perfect from Day 1. Meanwhile, back to reality?
I agree. It seems like people expect that the Presidency is a dictatorship, in that he can just walk in to office and expect things to happen within the hour. Doesn't even work that way in my office and there's barely any red tape and much less security to worry about.
Give the guy a break for a second. I mean it's not like he's declared war (or a police action) on anyone.
Horses**t! All the left did for 8 years was cry and whine about Bush this, Bush that... Looks like [Obama] is going to have to get used to it. Did you think we'd make it easy on him? Hell No! Just remember that 46% of us did not vote for him, hardly what you call a landslide or mandate...
[CNET editors' note: Prohibited content edited out.]
What about his comment is bigoted? He is correct in that Obama did not win by a land slide and the people who did not vote for him are going to hold his feet to the fire for each promise he made. This is no different than what happened with Bush. Just becuase someone didn't vote for Obama doesn't make them a bigot. Just becuase you don't agree with Obama doesn't make you a bigot.
Chris,
Interesting article, but as you pointed out about the two blogs you cited this two is a non-artilce. It's good to keep tabs on promises, but this early on I won't hold it against him. If this becomes a problem over the next few months then it's an artilce.
From the title "WHITE HOUSE EXPANDS USE OF SEARCH-BLOCKING CODE" I thought this was something serious.
But instead I find yet another article analyzing the use of a robots.txt file on the president's website, as if that is some sort of indicator of national policy.
Slow news day I guess.
Bush's administration left the White House information system in shambles.
Obama has not failed to keep his promises. He has been shoveling Republican crap out of the barn before he can bring in the new horses.
- by pmocek February 6, 2009 5:16 PM PST
- re: broken commitment
- Reply to this comment
-
(12 Comments)The White House Blog <a href="http://www.whitehouse.gov/blog_post/update_on_sunlight_before_signing/">posted</a> today:
<i>"As we've noted on the blog, the President has signed the Lilly Ledbetter Fair Pay Act and the Children?s Health Insurance Program Reauthorization Act. We've also published the DTV Delay Act of 2009.
"Since a few questions have come in, we want to update you on the President's campaign commitment to introducing more sunlight into the lawmaking process by posting non-emergency legislation online for five days before signing it. This policy will be implemented in full soon; currently we are working through implementation procedures and some initial issues with the congressional calendar.
The President remains committed to bringing more transparency to government, and in this spirit the White House will continue to publish legislation expected to come to his desk online for public comment as it moves through Congress."</i>