• On TechRepublic: Windows 7: Slower to boot than Vista?
February 19, 2009 5:41 AM PST

Recovery.gov blocked search engine tracking

by Chris Soghoian

(Credit: Recovery.gov)

Update: As of 8 a.m. PST, within three hours of this story first going live, it appears that President Obama's Web team has (silently) pulled the robots.txt file from the Recovery.gov Web site. The site is now open to Web crawlers of all kinds.

The Obama administration has apparently opted to forbid Google and other search engines from indexing any content on the newly launched Recovery.gov.

Is this even more evidence that the administration's much-publicized commitment to transparency is simply hype?

Recovery.gov, which went live Tuesday, is set to act as a central clearinghouse for information related to the newly signed American Recovery and Reinvestment Act. The legislation is designed to stimulate the flagging U.S. economy.

In a video message, available on YouTube and embedded into the new site, President Obama states that the "size and scale of (the stimulus) plan demands unprecedented efforts to root out waste, inefficiency, and unnecessary spending. Recovery.gov will be the online portal for these efforts." He adds that the new site will be used to publish information on how the stimulus funds will be spent in a "timely, targeted, and transparent manner."

Although the site is advertised as proof of the president's commitment to transparency, its technical design seems to betray that spirit. Most importantly, the site currently blocks all requests by search engines, which would ordinarily download and index each page to make the information more accessible to the Web-searching public.

The site's robots.txt file has just a few lines of text:

# Deny all search bots, web spiders
User-agent: *
Disallow: /

Although the White House Web team did not immediately respond to a request for comment, the single-line comment at the top of the file indicates that the blocking of search engines is no accident but rather a statement of policy.

Many sites use a robots.txt file to communicate, in machine-readable terms, the Web pages that they do and don't wish to be indexed by search engines. While the files don't carry much, if any, legal weight, most search engines act as good Internet citizens and honor the requests.

Luckily for the millions of Americans who might wish to find out how their money is going to be spent, it seems that Google has opted to ignore the administration's restrictive robots.txt on the stimulus-related site. It is unclear if this is due to an error or a manual override by someone at Google, but a quick search turns up more than 60 Web pages on Recovery.gov that have been indexed by the search engine's Web crawlers in just the past three days.

Also, the stimulus bill requires that the site be run by the new Recovery Accountability and Transparency Board, but it seems to currently be under the control of the White House Web team--the same folks who revamped Whitehouse.gov and whose use of the robots.txt search engine-blocking code was expanded after the site initially was praised by bloggers for its openness.

It is this blogger's hope that with a bit of gentle prodding by members of the pro-transparency community, Recovery.gov's administrators will correct the "unintentional oversight" that was made in launching the site with such an restrictive robots.txt file.

Christopher Soghoian delves into the areas of security, privacy, technology policy and cyber-law. He is a student fellow at Harvard University's Berkman Center for Internet and Society , and is a PhD candidate at Indiana University's School of Informatics. His academic work and contact information can be found by visiting www.dubfire.net/chris/. He is a member of the CNET Blog Network, and is not an employee of CNET. Disclosure.
Recent posts from Surveillance State
YouTube's new 'nocookie' feature continues to serve cookies
Is the White House changing its YouTube tune?
Recovery.gov blocked search engine tracking
Obama's BlackBerry brings personal safety risks
White House expands use of search-blocking code
Activists call for a mashup-friendly Recovery.gov
White House yanks 'YouTube' from privacy policy
White House acts to limit YouTube cookie tracking
Add a Comment (Log in or register) Showing 1 of 3 pages (77 Comments)
by wolivere February 19, 2009 5:52 AM PST
Not sure whats worse?

So they don't want bot's and spiders. Most likely anyone who wants the info has the webpage info and can go to it and find the info on there own directly. I don't see much of a problem there.

But then Google ignores protocol and over rides the desire to not allow spiders and bot's.

Should the article not have Read? "Google Ignores requests to not index site, Index's Recovery.org Anyhow?"
Reply to this comment
by Zoobie February 19, 2009 8:07 AM PST
Not sure what's worse?!

A new US President who promises transparency, increases the national debt to unprecedented levels so that our grandchildren will be paying this off, and then secretly sets up code to not allow indexing of the website that is supposed to help people understand what's happening with that money...

...or...

A for-profit company with an objective of making the internet easily accessible to everyone and decides to make searchable the website that is set up for the benefit of American citizens to track the government's openess.

Hmmm.... I'm going out on a limb and saying Google isn't evil in this instance.
by 7679vaska February 19, 2009 9:45 AM PST
As someone who works on websites the robots.txt may have been used for the 'not quite ready for launch yet but make it public anyway'. I don't see what the fuss is about and so perhaps they forgot or needed some extra time before things got to be archived forever by search engines. No human was denied access to the information from the site if you visited.

As for Google, I believe they may have opted to index (especially .gov sites) with our without the explicit instructions of the robot.txt, but if the content was available on their index before the robots.txt was removed/modified then that would be a concern--this should be further investigated.
by tm_anon February 19, 2009 9:33 PM PST
@Zoobie

I hate to tell you, but the debts we're getting now will be paid by our great-grandchildren. Our grandchildren will still be paying for Bush.
by February 21, 2009 5:21 PM PST
I have to agree with this. As a web developer, I would almost certainly put something like that in my code until I was ready to get indexed.

I think Christopher is jumping to conclusions in thinking that this was some high-level policy decision.
by directorblue February 19, 2009 5:59 AM PST
Gee, you mean the Democrats lied about transparency?

You say they lied when they said the stimulus bill had no pork or earmarks in it?

And they lied when they said the Stimulus wouldn't paying off their constituencies like ACORN, the SEIU and trial lawyers?

And they lied about how awful the economy is, likening it to the Great Depression with 25% unemployment, thousands of bank failures, and starvation in the streets?

And they lied about the root cause of the financial meltdown, insisting it was over-regulation when everyone knows that Democratic insiders treated Fannie Mae like a piggy-bank, with accounting scandal after accounting scandal leading up to the meltdown?

And they lied about pulling troops out of Iraq and Afghanistan?

I don't know about you, but I'm really shocked.
Reply to this comment
by msanto February 19, 2009 8:01 AM PST
I'm shocked this commenter forgot who was in charge all those years when the economy went in the dumper. Not the Dems.
by tdreher February 19, 2009 8:08 AM PST
@msanto

Look up the community reinvestment act, the economy collapsed due to the sub prime mortgages, required by Clinton and Carter. Maybe you should read more and watch less MSNBC. Its common knowledge to most people, not a conspiracy to blame to dems.
by Randomletters1 February 19, 2009 8:10 AM PST
Ugh. Who had the majority in the Legislative branch for most of the Bush Admin?

Partisan ******** is useless when they all cash checks from the same lobbying groups, and are all only interested in solidifying and expanding their own personal power base.

No one, NO ONE, in Washington is blameless, nor are any of us. If we complain loudly and frequently enough, things may change. If we trust in one man to change everything for us, we're well and truly screwed. Get involved in solutions, not in complaining about accomplished fact.

For every political gaffe that someone attributes to one party, there is an equal misstep committed by the other. the only difference is in the time it occurred.
by egghead1619 February 19, 2009 8:12 AM PST
And I'm shocked you still think there's much difference between the parties. They are all politicians only interested in what will heighten their image to their constituents. Everyone keeps saying that it was Bush that caused the economy collapse, but neglect to realize that Congress had at least as much involvement and industry leaders may have had even more. The legislators aren't representing the people, they are representing the lobbying groups that provide the most money.
by directorblue February 19, 2009 9:01 AM PST
@msanto -

Unless one party has a filibuster-proof majority -- which the GOP never had -- that statement is meaningless.

This testimony -- in illustrated, comic book format -- is easy to follow:

http://directorblue.blogspot.com/2008/09/testimony-that-will-have-you-pulling.html

The House Banking & Finance Committee blocked all oversight of the housing market facilitators (FHA, Fannie, Freddie) for years... on a straight party line.

Check out the testimony. You won't be sorry.
by pentest February 22, 2009 1:53 PM PST
Those morgages destroyed the banking industry because of derivatives and the deregulation that caused it.

Stop listening to Rush and you won't sound like such an idiot.
by Xenophons_Gunny February 19, 2009 6:14 AM PST
"There's a chill in the air." -- Anon.
Reply to this comment
by February 19, 2009 6:14 AM PST
When I have been working on a new website, I have configured the robots.txt file to disallow indexing. Then when it is ready to go production, I change the file to allow indexing. It is always my fear that I will forget to change this. I wonder if this task got forgotten? Has anyone contacted the recovery.gov web master to confirm that this was their intention?
Reply to this comment
by jon_abad February 19, 2009 6:28 AM PST
I concur.
I would hope that a news organization like Cnet would bother to ask the White House's web folks for a comment in order to determine if the spider blocking in robots.txt was on purpose as opposed to just posting conjecture.
by Archus February 19, 2009 7:19 AM PST
I have to concur too. I not only do this, but I also disallow indexing of sensitive sites, simply because there are times Google can find a way into something that a person would never think of. I mean, really, there are times when you only want people to enter through the front door.
by ghostofitpast February 19, 2009 7:33 AM PST
jon_abad makes a good point about what sort of a "news organization" CNET actually is. Lots of news organizations now use their Web site as a platform for a large team of bloggers who can expand the breath of that organizations coverage. One can understand that the host organization takes a minimal approach to editing these bloggers. However, in this case the question arises as to whether or not anyone on the CNET news staff takes the time to read what those bloggers write. The content of this post deserves the treatment of serious journalism to determine whether this is really a "story with legs" or an alarmist reaction to standard Web design practice.
by msanto February 19, 2009 8:03 AM PST
Good point by ghostofitpast.

This actually sounds like the ramblings of someone who should be writing for Ann Coulter or some other right-winger, also. I wonder what political affiliation this writer has.
by johnqh February 19, 2009 10:07 AM PST
I concur too.

People making a big deal about this obviously never did a website.

During the construction, the website may contain template data, place holder etc, or maybe is simply not working. You don't want to be indexed.

Conclusion?
1. Chris Soghoian never did a website himself.
2. Google's index engine has a bug.
by Dalkorian February 19, 2009 11:15 AM PST
But that just doesn't sound evil enough. Retardicans *MUST* make all Democrats seem evil at all times, especially now that America has awoke from our imposed 8 year nightmare.

Terrorism is all they have to offer because it's all they know.
by rmva February 19, 2009 6:17 AM PST
It never occurred to me that Google would ignore robots.txt. What an idiot I am!
Reply to this comment
by danielvictorio February 19, 2009 6:17 AM PST
the purpose of allowing bots and spiders is that a web page WILL be found by those not connected to the issue originally, like say by a student, or someone who wants to learn something, or someone who is just exploring... etc.
Reply to this comment
by ewsachse February 19, 2009 6:23 AM PST
Sure Google is our friend. Not.

They do not honor robots.txt, and the conspire with China to oppress the Chinese citizens while making a quick profit.

To those who rant and rave about the Obama administration after only a few weeks in office, would you really expect this type of open dialog from the Bush administration? How about the hypothetical McCain administration. McCain admitted that he does not know how to use a personal computer, and Palin probably would have thought the server was a moose and shot it.

As of today, if you click on http://www.recovery.gov/robotx.txt, you get a 404 file not found.

Is this just another stupid story from some worthless blogger at C/Net?

Why not just write another article fawning over Apple like 99% of the rest of your articles?
Reply to this comment
by TerraKhan February 19, 2009 6:36 AM PST
It is not surprising that clicking on http://www.recovery.gov/robotx.txt gives you a 404 error.

The correct file can be located at http://www.recovery.gov/robots.txt and is still intact.

An error such as this effectively invalidates all of your other commments, but thank you for playing.
by gggg sssss February 19, 2009 5:10 PM PST
Google is an American citizen, so to speak, This site was developed with Google's money as much as anyone else ( maybe more since Google is actually proitable. Why shopuld they NOT search the site they paid for?
by Get_Bent February 19, 2009 6:34 AM PST
Meet the new boss, same as the old boss....

-- The Who, "Won't Get Fooled Again"
Reply to this comment
by ddmcd February 19, 2009 6:38 AM PST
Did anyone ask the White House why this is being done?
Reply to this comment
by santacruz12 February 19, 2009 6:39 AM PST
Get used to the Orwellian nature of the Obama Administration. It is called doublespeak where you describe what you are doing with verbiage that is the opposite of what you really doing. War is Peace is a good and famous example from Orwell's 1984. You see in the comments on this page the willingness of many to go along with this and believe that Concealment is Transparency. It is not going to get better. Thanks Chris for bringing this to light. Otherwise we would have no idea of what was taking place.
Reply to this comment
by Dalkorian February 19, 2009 11:21 AM PST
Retardicans are simply amazing. If this was bushit, the site wouldn't exist at all. People would only hear about it from a select few hand-picked "journalists" who were approved by bushit and chimpy personally and most of what you heard would be a lie.

Obama crated a site telling us what's going on and you're crying like infants about a robot.txt file as if it was the most evil thing on the planet.

When I finish laughing at you, I'll start pitying you.

(Note - this file has since been removed, which is a good thing. I'm not arguing that Obama is the messiah or anything, but I don't have the short-term memory issue many others seem to be suffering.)
by kenstech_com February 19, 2009 6:39 AM PST
The Great Obama (PBUH) works in Mysterious Ways. Perhaps you need to look within yourself and try to understand the latent racism that leads you to doubt Him. You must never question... you must only Believe!

Ken
www.kenstech.com
Reply to this comment
by Dalkorian February 19, 2009 11:21 AM PST
LOL - did you come up with this yourself or did Limbaugh help you?
by bvdon February 19, 2009 7:12 AM PST
It is interesting to note that the recovery.gov site that is all about transparency launched AFTER he signed the TRILLION dollar stimulus bill.

The Internet is going to explode on Obama.... Bush got it pretty bad, but more voices are going to be using the Internet and Obama is going to get nailed (deservedly so).
Reply to this comment
by gerrrg February 19, 2009 7:17 AM PST
Sounds more like FUD. BFD boys, BFD.
Reply to this comment
by Archus February 19, 2009 7:25 AM PST
I have to say, I find it sad that people are willing to opinionate against an entire administration and presidency simply on the content of a robot.txt file. I guess next we'll be trying the administration as terrorist for closing their curtains at night. Can we at least give the guy a couple of months to work with, I mean we gave Bush 8 years to get it wrong.
Reply to this comment
by Dalkorian February 19, 2009 11:22 AM PST
That's simply unfair to retardicans. We should give them more leeway, being mentally challenged and all.
by chris565 February 19, 2009 7:30 AM PST
Oh please. A new site launching as a VERY small part of a huge project (transition to power, getting stimulus passed, using it, dealing with everything else) and a bunch of geeks who still live with mom and dad want to nit-pick the fact that they aren't allowing searches as the site goes up. No word on the cool use of the MIT timeline, nothing about the fact that this site is there in the first place, just pissing and whining.

Like gerrg says, B. F. D.
Reply to this comment
by yarrdstick February 19, 2009 7:34 AM PST
Seriously people have no sense of reality.

Do you really think there was a "Mr. President, how would you like our Robots.txt to be configured" conversation?

Even if it was intentional and not just forgotten while the site was being launched, this is probably the first Obama has heard of it.

Ever heard the phrase don't throw the baby out with the bathwater?
Reply to this comment
by gggg sssss February 19, 2009 5:12 PM PST
remember, this site was devloped by govt employees, or contractors from EDS. Or worse, a bunch of guys from Mumbai. Neither the sharpest at th e best of times.
by Randomletters1 February 19, 2009 7:38 AM PST
To anyone who begs us to go easy on this administration: Why? Yeah, it's a young administration, but we're not in the best spot. Obama got elected because enough people thought he was the right person to enact change for the better. I see nothing, not a thing, wrong with demanding that he make those changes happen. If you want to give him time, then how much time is enough? Also, since this is a brand-new site, you'd think that they wouldn't have to deal with 'undoing' any past errors with regard to transparency, or anything else for that matter.

Our government should be held accountable every day, every frapping day, for the things that they say they will do and also for the things that they actually do. Discrepancies between the two should be called out and explained. That's how our government works: we select people to represent us and they are obliged to be able to explain their actions to their constituency. Just because accountability has been in short supply in the past isn't reason enough to ignore accountability now. In fact, it's a damn good reason to step up our efforts to ensure that things change.

I do HOPE that things will change, but realistically, in a system where all parties are pretty much bought and paid for by the same masters, that change will only come when the masses truly demand the sort of representation that they are promised. So, in that vein, hell yes I'll be irate about stuff like this. If enough of us are, maybe it will get through the beltway logic-barrier.

Oy.
Reply to this comment
by Zoobie February 19, 2009 8:11 AM PST
Well said.
by Dalkorian February 19, 2009 11:28 AM PST
by Randomletters1 February 19, 2009 7:38 AM PST
If you want to give him time, then how much time is enough?

---------------------------------------------------------------------------

Are you really asking for a timeline here? It took bushit 8 years to destroy this nation and you expected Obama to fix it all inside of a month?

Are you delusional or a retardican?

I'm willing to give him the benefit of the doubt for most of this year. If things still look pretty crummy in the fall, I'll start calling for the pace to be picked up too. But after only a month in office, you've hardly given him time to get his coat off before casting him back out for not fixing the problems your wife made while making dinner.
by pentest February 22, 2009 2:01 PM PST
Do you really think Obama told them what to put in a text file?
by gsekse February 19, 2009 7:41 AM PST
If I ran a little website with nothing on it, google would honor my robots.txt request. Why bother.

BUT, if I run a US government sponsored site, with information that is required to be accessible to the taxpayers, I'm am sure that a company like Google or Yahoo or Excite, yada yada, would be happy to explore the site in spite of the request not to. If you don't want Google in your stuff, don't post it to an unsecured website.

As a taxpayer, be glad that there is many ways to monitor what the hell is going on. Washington has way too many secrets as it is.
Reply to this comment
by renonative February 19, 2009 7:46 AM PST
well it looks like they fixed it by removing the robots.txt file all together.
Reply to this comment
by cuwickliffe February 19, 2009 7:48 AM PST
The robots.txt file is gone now. Looks like it was for when they were building the site. Panic may not subside.
Reply to this comment
by Dalkorian February 19, 2009 11:30 AM PST
Of course it won't and apologies will never come. It wasn't a problem per-se, but a cause for the whiny infantile retardicans to rally around - another scratch in Obama's armor to point at and magnify as much as possible.

That's what losers do.
Showing 1 of 3 pages (77 Comments)
advertisement

FAQ: Buying the right Windows 7 upgrade

Readers still have lots of questions on just which version of the software they need to buy in order to upgrade their PC. CNET News tries to offer some answers.

N.Y. lawsuit details Intel's 'largesse' toward Dell

Attorney General Andrew Cuomo's federal antitrust case filed Wednesday alleges a longstanding symbiotic relationship between Intel and Dell.

advertisement

About Surveillance State

Christopher Soghoian delves into the areas of security, privacy, technology policy and cyber-law. He is a student fellow at Harvard University's Berkman Center for Internet and Society, and is a PhD candidate at Indiana University's School of Informatics. His academic work and contact information can be found by visiting www.dubfire.net/chris/. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure.

Add this feed to your online news reader

Surveillance State topics

advertisement
advertisement

Inside CNET News

Scroll Left Scroll Right