Recovery.gov blocked search engine tracking
(Credit:
Recovery.gov)
Update: As of 8 a.m. PST, within three hours of this story first going live, it appears that President Obama's Web team has (silently) pulled the robots.txt file from the Recovery.gov Web site. The site is now open to Web crawlers of all kinds.
The Obama administration has apparently opted to forbid Google and other search engines from indexing any content on the newly launched Recovery.gov.
Is this even more evidence that the administration's much-publicized commitment to transparency is simply hype?
Recovery.gov, which went live Tuesday, is set to act as a central clearinghouse for information related to the newly signed American Recovery and Reinvestment Act. The legislation is designed to stimulate the flagging U.S. economy.
In a video message, available on YouTube and embedded into the new site, President Obama states that the "size and scale of (the stimulus) plan demands unprecedented efforts to root out waste, inefficiency, and unnecessary spending. Recovery.gov will be the online portal for these efforts." He adds that the new site will be used to publish information on how the stimulus funds will be spent in a "timely, targeted, and transparent manner."
Although the site is advertised as proof of the president's commitment to transparency, its technical design seems to betray that spirit. Most importantly, the site currently blocks all requests by search engines, which would ordinarily download and index each page to make the information more accessible to the Web-searching public.
The site's robots.txt file has just a few lines of text:
# Deny all search bots, web spiders
User-agent: *
Disallow: /
Although the White House Web team did not immediately respond to a request for comment, the single-line comment at the top of the file indicates that the blocking of search engines is no accident but rather a statement of policy.
Many sites use a robots.txt file to communicate, in machine-readable terms, the Web pages that they do and don't wish to be indexed by search engines. While the files don't carry much, if any, legal weight, most search engines act as good Internet citizens and honor the requests.
Luckily for the millions of Americans who might wish to find out how their money is going to be spent, it seems that Google has opted to ignore the administration's restrictive robots.txt on the stimulus-related site. It is unclear if this is due to an error or a manual override by someone at Google, but a quick search turns up more than 60 Web pages on Recovery.gov that have been indexed by the search engine's Web crawlers in just the past three days.
Also, the stimulus bill requires that the site be run by the new Recovery Accountability and Transparency Board, but it seems to currently be under the control of the White House Web team--the same folks who revamped Whitehouse.gov and whose use of the robots.txt search engine-blocking code was expanded after the site initially was praised by bloggers for its openness.
It is this blogger's hope that with a bit of gentle prodding by members of the pro-transparency community, Recovery.gov's administrators will correct the "unintentional oversight" that was made in launching the site with such an restrictive robots.txt file.
Christopher Soghoian delves into the areas of security, privacy, technology policy and cyber-law. He is a student fellow at Harvard University's Berkman Center for Internet and Society , and is a PhD candidate at Indiana University's School of Informatics. His academic work and contact information can be found by visiting www.dubfire.net/chris/. He is a member of the CNET Blog Network, and is not an employee of CNET. Disclosure. 





So they don't want bot's and spiders. Most likely anyone who wants the info has the webpage info and can go to it and find the info on there own directly. I don't see much of a problem there.
But then Google ignores protocol and over rides the desire to not allow spiders and bot's.
Should the article not have Read? "Google Ignores requests to not index site, Index's Recovery.org Anyhow?"
A new US President who promises transparency, increases the national debt to unprecedented levels so that our grandchildren will be paying this off, and then secretly sets up code to not allow indexing of the website that is supposed to help people understand what's happening with that money...
...or...
A for-profit company with an objective of making the internet easily accessible to everyone and decides to make searchable the website that is set up for the benefit of American citizens to track the government's openess.
Hmmm.... I'm going out on a limb and saying Google isn't evil in this instance.
As for Google, I believe they may have opted to index (especially .gov sites) with our without the explicit instructions of the robot.txt, but if the content was available on their index before the robots.txt was removed/modified then that would be a concern--this should be further investigated.
I hate to tell you, but the debts we're getting now will be paid by our great-grandchildren. Our grandchildren will still be paying for Bush.
I think Christopher is jumping to conclusions in thinking that this was some high-level policy decision.
You say they lied when they said the stimulus bill had no pork or earmarks in it?
And they lied when they said the Stimulus wouldn't paying off their constituencies like ACORN, the SEIU and trial lawyers?
And they lied about how awful the economy is, likening it to the Great Depression with 25% unemployment, thousands of bank failures, and starvation in the streets?
And they lied about the root cause of the financial meltdown, insisting it was over-regulation when everyone knows that Democratic insiders treated Fannie Mae like a piggy-bank, with accounting scandal after accounting scandal leading up to the meltdown?
And they lied about pulling troops out of Iraq and Afghanistan?
I don't know about you, but I'm really shocked.
Look up the community reinvestment act, the economy collapsed due to the sub prime mortgages, required by Clinton and Carter. Maybe you should read more and watch less MSNBC. Its common knowledge to most people, not a conspiracy to blame to dems.
Partisan ******** is useless when they all cash checks from the same lobbying groups, and are all only interested in solidifying and expanding their own personal power base.
No one, NO ONE, in Washington is blameless, nor are any of us. If we complain loudly and frequently enough, things may change. If we trust in one man to change everything for us, we're well and truly screwed. Get involved in solutions, not in complaining about accomplished fact.
For every political gaffe that someone attributes to one party, there is an equal misstep committed by the other. the only difference is in the time it occurred.
Unless one party has a filibuster-proof majority -- which the GOP never had -- that statement is meaningless.
This testimony -- in illustrated, comic book format -- is easy to follow:
http://directorblue.blogspot.com/2008/09/testimony-that-will-have-you-pulling.html
The House Banking & Finance Committee blocked all oversight of the housing market facilitators (FHA, Fannie, Freddie) for years... on a straight party line.
Check out the testimony. You won't be sorry.
Stop listening to Rush and you won't sound like such an idiot.
I would hope that a news organization like Cnet would bother to ask the White House's web folks for a comment in order to determine if the spider blocking in robots.txt was on purpose as opposed to just posting conjecture.
This actually sounds like the ramblings of someone who should be writing for Ann Coulter or some other right-winger, also. I wonder what political affiliation this writer has.
People making a big deal about this obviously never did a website.
During the construction, the website may contain template data, place holder etc, or maybe is simply not working. You don't want to be indexed.
Conclusion?
1. Chris Soghoian never did a website himself.
2. Google's index engine has a bug.
Terrorism is all they have to offer because it's all they know.
They do not honor robots.txt, and the conspire with China to oppress the Chinese citizens while making a quick profit.
To those who rant and rave about the Obama administration after only a few weeks in office, would you really expect this type of open dialog from the Bush administration? How about the hypothetical McCain administration. McCain admitted that he does not know how to use a personal computer, and Palin probably would have thought the server was a moose and shot it.
As of today, if you click on http://www.recovery.gov/robotx.txt, you get a 404 file not found.
Is this just another stupid story from some worthless blogger at C/Net?
Why not just write another article fawning over Apple like 99% of the rest of your articles?
The correct file can be located at http://www.recovery.gov/robots.txt and is still intact.
An error such as this effectively invalidates all of your other commments, but thank you for playing.
-- The Who, "Won't Get Fooled Again"
Obama crated a site telling us what's going on and you're crying like infants about a robot.txt file as if it was the most evil thing on the planet.
When I finish laughing at you, I'll start pitying you.
(Note - this file has since been removed, which is a good thing. I'm not arguing that Obama is the messiah or anything, but I don't have the short-term memory issue many others seem to be suffering.)
Ken
www.kenstech.com
The Internet is going to explode on Obama.... Bush got it pretty bad, but more voices are going to be using the Internet and Obama is going to get nailed (deservedly so).
Like gerrg says, B. F. D.
Do you really think there was a "Mr. President, how would you like our Robots.txt to be configured" conversation?
Even if it was intentional and not just forgotten while the site was being launched, this is probably the first Obama has heard of it.
Ever heard the phrase don't throw the baby out with the bathwater?
Our government should be held accountable every day, every frapping day, for the things that they say they will do and also for the things that they actually do. Discrepancies between the two should be called out and explained. That's how our government works: we select people to represent us and they are obliged to be able to explain their actions to their constituency. Just because accountability has been in short supply in the past isn't reason enough to ignore accountability now. In fact, it's a damn good reason to step up our efforts to ensure that things change.
I do HOPE that things will change, but realistically, in a system where all parties are pretty much bought and paid for by the same masters, that change will only come when the masses truly demand the sort of representation that they are promised. So, in that vein, hell yes I'll be irate about stuff like this. If enough of us are, maybe it will get through the beltway logic-barrier.
Oy.
If you want to give him time, then how much time is enough?
---------------------------------------------------------------------------
Are you really asking for a timeline here? It took bushit 8 years to destroy this nation and you expected Obama to fix it all inside of a month?
Are you delusional or a retardican?
I'm willing to give him the benefit of the doubt for most of this year. If things still look pretty crummy in the fall, I'll start calling for the pace to be picked up too. But after only a month in office, you've hardly given him time to get his coat off before casting him back out for not fixing the problems your wife made while making dinner.
BUT, if I run a US government sponsored site, with information that is required to be accessible to the taxpayers, I'm am sure that a company like Google or Yahoo or Excite, yada yada, would be happy to explore the site in spite of the request not to. If you don't want Google in your stuff, don't post it to an unsecured website.
As a taxpayer, be glad that there is many ways to monitor what the hell is going on. Washington has way too many secrets as it is.
- by cuwickliffe February 19, 2009 7:48 AM PST
- The robots.txt file is gone now. Looks like it was for when they were building the site. Panic may not subside.
- Like this Reply to this comment
-
-
- by Dalkorian February 19, 2009 11:30 AM PST
- Of course it won't and apologies will never come. It wasn't a problem per-se, but a cause for the whiny infantile retardicans to rally around - another scratch in Obama's armor to point at and magnify as much as possible.
- Like this
-
Showing 1 of 3 pages (77 Comments)That's what losers do.