Responding to criticism from privacy activists, YouTube in the past two weeks has rolled out a number of new privacy features. Chief among these is a "delayed cookie" option thatYouTube promises will not leave cookies in the browsers of users who have not yet clicked the "play" button to view a video.
While this statement is true for traditional Web browser-based cookies, YouTube's cookie-lite solution still leaves long-term, non-session Flash cookies behind in the Web browser of visitors who have yet to actually click play to watch the YouTube videos.
As revealed on this blog yesterday, YouTube has recently rolled out a number of new privacy features, chiefly in response to privacy activists complaining about the company's use of non-session cookies.
Writing on the Google corporate policy blog Tuesday, Steve Grove of YouTube stated:
To ensure that we openly communicate about privacy issues on all federal websites that use our technology, we created an embeddable video player that does not send a cookie until the visitor plays the video.
YouTube's online technical documentation also reveals a bit more about the feature:
Enabling delayed cookies means that the YouTube video player will not set any non-session cookies on the computer of a visitor (viewing the page on which the YouTube video is embedded). The YouTube video player may set non-session cookies on the visitor's computer once the visitor clicks on the YouTube video player.
While this statement is true for browser-based permanent cookies, it is still a false statement. Visitors to Web pages that have made use of this new cookie-lite feature continue to receive long-lasting Flash cookies, even when they do not click play to watch a video.
The Electronic Privacy Information Center has thoroughly described the Flash cookie privacy problem:
Flash cookies provide the only method by which a flash movie can store information on a user's computer....
Few consumers are aware of where Flash cookies are stored or how to control their use. Normal web cookies can be managed via the preferences dialog of most web browsers, but no similar utility is included for these Flash cookies. It is possible for Flash cookies to remain on user's computer indefinitely, as there is no mechanism to set an expiration date on Flash cookies.
The only way to delete these well-hidden objects is to visit a special Web page on Adobe's site. The existence of Flash cookies and the need to visit the special Adobe Web site to remove them is not widely known by most Web users.
Web browsers are unable to automate the process of Flash cookie removal. As a result, those in the security community have had to take rather extreme steps to try to automate the process of Flash cookie removal in a way that doesn't break most Web functionality. These obscure techniques remain far too advanced for non-technical users.
Proof of YouTube's use of Flash cookies
To verify that YouTube is still using non-session cookies, follow these steps:
- First, go to the Adobe Flash Settings Manager page, and delete all of your old Flash cookies.
A screenshot of an empty Flash cookie jar
- Close all of your browser tabs, and restart your browser. Now revisit the Adobe Flash Settings Manager page, and verify that you still have no Flash cookies.
Then, go to a Web page that is making use of the new YouTube "delayed cookies" feature. For this example, we used Barack Obama's inaugural address, as embedded into one of the older White House blog entries.
(As we noted on this blog yesterday, the White House used an in-house Flash based tool for its latest weekly video address. Earlier messages from the President are still delivered using YouTube, although the White House tech team has enabled the "delayed cookie" option for all of these).
- By looking through the source code for that blog page, we can verify that the YouTube flash file is indeed being served from youtube-nocookie.com, and thus should be making use of the "delayed cookie" feature.
<script type="text/javascript"> var params = { allowscriptaccess: "always", allowfullscreen: "true" }; swfobject.embedSWF("http://www.youtube-nocookie.com/v/3PuHGKnboNY&hl=en&fs=1&showinfo=0", "flashcontent", "480", "295", "8", null, {}, params); </script> - Wait for the YouTube flash file to load, but do not click play. Now, close all your browser tabs, and then restart the browser.
- Remember that session-cookies, by definition, are for a single browsing session, and thus when you restart the browser, all previous session cookies are deleted. Anything still hanging around is long-term.
- Now, go back to the Adobe Flash Settings Manager, and you should see that a cookie from s.ytimg.com (a domain controlled by Google) has now been quietly added to your Flash cookie jar, even though the White House Web site made use of the "delayed cookie" option, and you never clicked the play button.
A screenshot of the flash-cookie jar, containing a cookie from YouTube
Analysis
Those in the privacy community will likely pounce on this as evidence of Google's hypocrisy, while Google will likely respond by carefully parsing the definition of the phrase "non-session cookie" to not include Flash-cookie objects. Google might even even argue that its Flash-based cookies do not contain unique tracking information (something this blogger is unable to verify, since the Adobe Flash Manager only allows you to delete, but not view the contents of a Flash cookie).
One thing is clear. YouTube has advertised a new delayed cookie feature, and stated that it "does not send a cookie until the visitor plays the video." That message is further reinforced by the fact that the new cookie-lite embedded video players are served from a different domain name, youtube-nocookie.com.
Yet a user visiting a page that includes one of these "delayed cookie" videos still ends up with a long term, non-session Flash cookie hidden away in the depths of their browser.
Technical definitions of "cookie" versus "Flash cookie" aside, YouTube's "delayed cookie" feature simply fails to deliver on the company's promises.
When reached for comment, Marc Rotenberg, the director of the Electronic Privacy Information Center, said:
(Regarding the) spat over cookies, the Youtube and the Whitehouse web site is the tip of the iceberg. There is a much bigger debate about Google's role in federal information policy looming.
The Google blog post, if read carefully, is very revealing. It is all about justifying Google's growing dominance in government information dissemination.
This is a business plan. It is tied directly to YouTube's advertising model and revenue forecasts. There is nothing about actual federal information policy.
Complying with federal laws (e.g. the Privacy Act which regulates data collection) or federal policy on persistent cookies are real obstacles. The question is whether Google will decide for itself whether it will comply with these laws or the people's representatives.
The debate is just beginning.
Google's PR team have yet to respond to queries from this blogger regarding the cookie issue.
Disclosure: In 2008, I worked as a policy fellow for the Electronic Privacy Information Center. In 2006, I worked as a summer intern at Google, and have twice received graduate fellowships from the company.
(Credit:
Recovery.gov)
Update: As of 8 a.m. PST, within three hours of this story first going live, it appears that President Obama's Web team has (silently) pulled the robots.txt file from the Recovery.gov Web site. The site is now open to Web crawlers of all kinds.
The Obama administration has apparently opted to forbid Google and other search engines from indexing any content on the newly launched Recovery.gov.
Is this even more evidence that the administration's much-publicized commitment to transparency is simply hype?
Recovery.gov, which went live Tuesday, is set to act as a central clearinghouse for information related to the newly signed American Recovery and Reinvestment Act. The legislation is designed to stimulate the flagging U.S. economy.
In a video message, available on YouTube and embedded into the new site, President Obama states that the "size and scale of (the stimulus) plan demands unprecedented efforts to root out waste, inefficiency, and unnecessary spending. Recovery.gov will be the online portal for these efforts." He adds that the new site will be used to publish information on how the stimulus funds will be spent in a "timely, targeted, and transparent manner."
Although the site is advertised as proof of the president's commitment to transparency, its technical design seems to betray that spirit. Most importantly, the site currently blocks all requests by search engines, which would ordinarily download and index each page to make the information more accessible to the Web-searching public.
The site's robots.txt file has just a few lines of text:
# Deny all search bots, web spiders
User-agent: *
Disallow: /
Although the White House Web team did not immediately respond to a request for comment, the single-line comment at the top of the file indicates that the blocking of search engines is no accident but rather a statement of policy.
Many sites use a robots.txt file to communicate, in machine-readable terms, the Web pages that they do and don't wish to be indexed by search engines. While the files don't carry much, if any, legal weight, most search engines act as good Internet citizens and honor the requests.
Luckily for the millions of Americans who might wish to find out how their money is going to be spent, it seems that Google has opted to ignore the administration's restrictive robots.txt on the stimulus-related site. It is unclear if this is due to an error or a manual override by someone at Google, but a quick search turns up more than 60 Web pages on Recovery.gov that have been indexed by the search engine's Web crawlers in just the past three days.
Also, the stimulus bill requires that the site be run by the new Recovery Accountability and Transparency Board, but it seems to currently be under the control of the White House Web team--the same folks who revamped Whitehouse.gov and whose use of the robots.txt search engine-blocking code was expanded after the site initially was praised by bloggers for its openness.
It is this blogger's hope that with a bit of gentle prodding by members of the pro-transparency community, Recovery.gov's administrators will correct the "unintentional oversight" that was made in launching the site with such an restrictive robots.txt file.
Just 12 hours after this blog highlighted the privacy problems associated with the White House's use of embedded YouTube videos, the Obama team rushed to deploy a technical fix that significantly protects the privacy of many (but not all) of the site's visitors.
Since its launch three days ago, President Obama's White House Web site has included several embedded YouTube videos. While this certainly demonstrates that the 44th president is Web 2.0 savvy, the decision to embed YouTube videos has also enabled the Google-owned video-sharing site to sneakily collect data on the millions of people who visit Whitehouse.gov--even those users who never click the "play" button to actually watch one of the videos.
Change.gov, the Web site for the Obama/Biden transition team, also made extensive use of YouTube videos. This practice was something that I sharply criticized back in November, citing the cookie-related privacy risks as well as the decade-old rules prohibiting the use of long-term tracking cookies on federal agency Web sites.
Unfortunately, when the new White House Web site launched, rather than fix the privacy issues that had plagued the transition team's Web site, Obama's legal team instead opted to provide YouTube with an exemption to those pesky federal regulations, letting it use long-term cookies to track visitors to the White House Web site. No other company was singled out and granted such a waiver.
It seems that someone in the White House read my blog post yesterday--as within 12 hours of the story going live, Obama's Web team rolled out a technical fix that severely limits YouTube's ability to track most visitors to the White House Web site.
By late Thursday evening, each embedded YouTube video had been replaced with an image of a video player, which a user must click on before the real YouTube player will be loaded. The result of this change is that YouTube is now only able to use cookies to track users who click on the "play" button on an embedded YouTube video--the majority of people who scroll through a page without clicking play will not be tracked.
This is clearly a step in the right direction--and it is particularly interesting to see that the White House has essentially rolled their own version of the Electronic Frontier Foundation's MyTube privacy tool.
While this is great news (especially after just a few hours), it is by no means a comprehensive solution, but a Band-Aid. Those users who do click the "play" button will be secretly tracked as they navigate the White House Web site--and if those users have visited YouTube or any other Google-run Web site in the past, the fact that they watched an Obama video will be added to the existing massive pile of data the company has compiled on each of them.
Simply put, there is no good reason for Google to be able to data mine a citizen's interaction with the president--especially when watching a video that was produced and uploaded by the White House at the taxpayers' expense.
The White House is already making use of Akamai's commercial edge caching services, and the transition team made full use of Amazon's Simple Storage Service for the download-friendly version of Obama's weekly address. Rather than using YouTube, the State Department has for some time opted to pay for a commercial, flash-based video streaming solution provided by Brightcove for its propaganda information site America.gov.
If the Obama team is willing to pay for some of its Web 2.0 technology, why can't they also follow the State Department's lead and cough up a few bucks for a streaming video service that doesn't cross-subsidize its offerings by tracking the Web habits of users.
Finally, if the White House lawyers are going to waive long-standing federal privacy rules for YouTube, merely mentioning the existence of that waiver is not enough. Given Obama's much publicized commitment to transparency, I think it's quite reasonable to ask that the team post the text of each and every waiver to the federal cookie policy to its Web site. Members of the public have a right to know the reasons that were used to justify exempting YouTube's cookies from these otherwise strict rules. If the YouTube waiver cannot withstand the analysis of legal experts and the ridicule of tech bloggers, it probably shouldn't have been authorized.
The White House Web site has been live for just three days, and in just the past day, Obama's administration has given us some reason to believe that it takes Web privacy seriously. Over the next few weeks, it'll have a chance to prove it.
Should members of the public be able to pay for Web advertisements detailing which companies have donated to politicians? While this seems like a great way to promote transparency in politics, Google forbids the practice--we are free to name the politicians who take money but cannot name the companies that give it.
With Google's domination of the search engine market, and the eyeballs that go along with it, the company's AdWords text ads have become a key way for activists, politicians, and corporations to reach the general public. However, over the past year, Google's excessively restrictive policies have resulted in the censorship of lawful advertisements that educated and informed the public.
In one the cases involving religious groups placing anti-abortion ads, Google backed down. As this post will explore, Google's rather absurd, and little known, trademark policy seriously harms the ability of citizens to highlight the donations made to politicians by large corporations.
Trademarks and AdWords
Over the past few years, Google has waged numerous legal battles in order to allow its advertising customers to purchase keyword ads for trademarked phrases. Thus, for example, Nike can make sure that ads for its shoes show up when a Web surfer searches Google.com for Reebok.
Under Google's current trademark policy, Nike can purchase advertisements that will display information for the company's own shoes, such as "Visit Nike.com to get great deals on shoes," but Google forbids anyone but a trademark owner from using a trademarked phrase in an ad. Thus an ad stating that "Nike shoes are worn by Barack Obama, not Reebok" would be forbidden, even if Nike could prove it were true.
This example with two large corporations battling it out doesn't really tug the heart strings. But what about the following few examples of ads, all of which are currently forbidden as per Google's trademark policy?
- A labor rights group that wished to place an ad stating that "Wal-Mart forbids its employees from unionizing," whenever someone searched for the phrase "minimum wage."
- A public-interest group that wished to place an ad stating that "The RIAA has filed over 30,000 lawsuits against Internet users, many of whom were children, elderly, or even dead," whenever a Google user searched for the words "file sharing."
- An activist who wished to place an advertisement stating that "AT&T has given $7,500 since 2004. Who else has donated to the senator?" The ad would be displayed when Internet users searched for the name of a particular politician.
While these first two examples are hypothetical, the final one has actually been censored by Google. I know, because a few weeks ago, Google informed me that an ad campaign that I had run for the last 5 months was being terminated due to a trademark complaint by AT&T.
No sunshine allowed
As regular readers of this blog will know, I dabbled in a bit of tech policy activism in the state of Indiana earlier this year, working on a data breach bill that eventually became law. During the process of getting that bill through committee, I had a nasty run-in with a state senator who didn't take too kindly to my blogging and was willing to hold up my bill as a way to force me to censor my criticism of his colleagues.
Once I left Indiana in May, I promptly registered multiple domain names for Republican State Senate whip Brandt Hershman, www.Brandt-Hershman.com and www.BrandtHershman.com. Both domains point to a single Web page that lists every campaign donation that Sen. Hershman has received, from all corporations, for the history of his political career.
In addition to setting up this Web site, I also placed a Google ad campaign so that anyone searching for "brandt hershman", "senator hershman," or a few other similar keywords would see an advertisement pointing to my site:
What does money buy?
AT&T has given $7,500 since 2004.
Who else has donated to the senator?
www.Brandt-Hershman.com
From June until December of this year, the ad ran without any complaints. However, on December 5, Google notified me that it had suspended my advertisement, based on a trademark complaint:
Thank you for advertising with Google AdWords. After reviewing your account, we've found that one or more of your ads or keywords does not meet our guidelines.
Ad Issue(s): Trademark in Ad Content
SUGGESTIONS:
-> Ad Content: Please remove the following trademark from your ad: AT&T.
When I appealed the suspension of the ad, Google replied with a bit more information, informing me that AT&T had complained about my use of the company's trademark:
Thank you for your email. I understand you're concerned that the term(s) AT&T has been disapproved in your account as a trademark.
Please note that we received a complaint from the trademark owner of AT&T. In their complaint, the trademark owner stated that they are the owner of the mark and that its use in certain advertisements is not authorized. Therefore, your ad was disapproved.
Google's policies, in depth
Google's official policy confirms its zero-tolerance stance toward trademarks in advertisements:
When we receive a complaint from a trademark owner, we only investigate the use of the trademark in ad text. If the advertiser is using the trademark in ad text, we will require the advertiser to remove the trademark and prevent them from using it in ad text in the future.
Google permits trademark owners to submit blanket complaints regarding the use of their mark in advertisements. This means that with just one request, a company can force the removal of every single advertisement that contains the trademark, even if the use is legitimate and lawful.
It's useful to compare Google's trademark and copyright policies. If a copyright owner (say, the Church Of Scientology or Viacom) wishes to force the removal of a link from the Google search index or videos from YouTube, that company must send an individual request for each file or Web site.
If Viacom wants to have 100 episodes of The Daily Show removed from YouTube, it takes 100 requests. However, if Viacom wants to force the takedown of 100 different advertisements that mention The Daily Show, it only takes a single request.
The requirement that copyright owners send individual takedown requests is an important speed bump that protects the fair-use rights of end users, who might be incorrectly accused of violating copyright. No such protection currently exists for Google AdWords customers who wish to lawfully comment on or critique companies whose names are trademarked.
Legal analysis
To make that I wasn't making a fuss out of nothing, I spoke to a number of prominent legal experts, all of whom shared my concern regarding the impact on free speech and transparency in politics.
First, I spoke with Wendy Seltzer, a fellow at Harvard's Berkman Center (disclosure: I am also a fellow at Berkman) and founder of the Chilling Effects Clearinghouse. She told me that:
Google should be concerned that its actions here may actually hurt its (and its users') ability to use trademarks for comparative and search purposes later. Google is now a large enough part of our Internet experience that its concessions to trademark bullies in AdWords could condition readers to think--incorrectly--that all uses of a trademark must be authorized by the trademark holder...
We need to resist this chipping-away at our rights to use brands to speak about the products they promote and things their owners do, and Google, as a major beneficiary of our prodigious use of language, should help us to do so.
Jim Harper, director of information policy studies at the Cato Institute also shared similar concerns:
What (Google) seems to be doing is accepting any complaint as conclusive proof that a trademark violation is occurring. This is a very poor practice, and it grants trademark owners power well beyond their legal rights. On a platform as important as Google's, that will result in a significant diminution of communication about corporations and, in this case, politicians too.
While he was concerned about the impact on free speech, Eric Goldman, a professor at the Santa Clara University School of Law, expressed some sympathy for Google, due to the risk of litigation by trademark owners:
Presumably, AT&T has requested Google not to let any advertisers display "AT&T" in the ad copy--whether the advertisers are competitors, pirates or political speakers. Google is within its legal rights to do so, and there is some legal support for Google's position.
However, unquestionably, Google's policy precludes legitimate trademark references such as yours.
This is not a good situation, but before we criticize Google too harshly, note that they face legal risks whatever they do, and they have tried to find a compromise solution...
Trademark law is so ridiculously expansive that Google feels compelled to implement illogical and chilling policies, so (in my opinion), the real villain is trademark law, not Google.
As both Goldman and Harper told me, Google is perfectly within its rights to refuse to display my advertisement, just as a newspaper or TV stations can refuse to air an ad. However, just as newspapers routinely publish advertisements that criticize companies, so, too, could Google, if it wished to.
The only recourse available to activists wishing to change Google's policies is thus shame--a tactic that has worked pretty well in other similar situations.
Freedom of Speech and Abortion
Earlier this year, a British anti-abortion organization sued Google, after the search engine refused to display an advertisement that the group had sought. The text of the ad was:
U.K. Abortion law
Key views and news on abortion law from The Christian Institute
www.christian.org.uk
Before the lawsuit, Google's policy did not permit the ads promoting Web sites that contained abortion and religion-related content. After a significant amount of bad press, and the settlement of the suit (brought under the United Kingdom's Equality Act), Google reversed itself.
Google's new policy allows religious associations to place ads "in a factual and campaigning way," a Google spokesperson told the British media. She went on to describe the policy in more detail:
This means that their ads need to aim to educate and inform, not to shock. The ads can refer to government legislation, and existing law, and the alternatives to abortion. But, they cannot link to Web sites which show graphic images that aim to shock people into changing their minds.
Outside of the online-advertising space, U.S. telecommunications giant Verizon Communications caused a huge media firestorm in 2007, when it blocked short text message alerts by NARAL, a pro-choice group.
Within days of its anti-free-speech blunder, Verizon quickly backtracked. However, by then, the damage to its reputation was done. Both Congress and the FCC took an interest in the incident, leading to threats of oversight and investigation.
Obviously, abortion is a hot-potato issue that no Fortune 500 company wishes to get caught in the middle of. However, the issue for both Google and Verizon was the same--the companies sell products that enable people to communicate with each other. When they start deciding which kinds of information is appropriate to send, they risk a significant public outcry, as well as the attention of both regulators and Congress.
With any luck, Google will realize that its flawed AdWords trademark policy is hurting free speech and efforts to promote transparency in government. If it doesn't, we all suffer.
Update at 9:30 a.m. PST: Video audience figures have been updated.
President-elect Barack Obama has now posted his second weekly address to YouTube, and it has already gotten more than 411,000 views. A week ago, I criticized the use of YouTube by Obama's transition team, calling it a no-bid giveaway to the Google-owned video-sharing site.
The solution I called for then--the adoption of BitTorrent as the official distribution platform for Change.gov--was, admittedly, a pipe dream.
In this post, I'll explain why the government needs to step up and host its own videos and why it is simply improper to rely on YouTube to foot the bandwidth bill for Obama's messages to the people. I will also make the case that the use of YouTube and Google Analytics by the Obama transition team violates the privacy of Web site visitors and possibly even violates federal rules banning the use of permanent tracking cookies on government sites.
YouTube as the platform of choice
The announcement a couple weeks ago of Obama's decision to use YouTube for his weekly addresses led to headlines across the world. The president-elect's use of streaming video technology was hailed as revolutionary or, as one transition team rep gushed, "just one of many ways that he will communicate directly with the American people and make the White House and the political process more transparent."
Obama's team uploaded his first video address to YouTube (928,000+ views), AOL (220+ views), Yahoo (8,400+ views), and MSN (545+ views)--all figures as of Monday morning.
In keeping with the spirit of this posting, the above video is not embedded.
(Credit: YouTube)For his second weekly video, the Obama team seems to have ditched AOL and only uploaded the video to YouTube, Microsoft's MSN, and Yahoo. Web 2.0 start-ups such as Veoh, Vuze, Revver, and Blip.tv have not gotten any love.
While the transition team should be commended for uploading the video to multiple sites (albeit all owned by multibillion-dollar tech titans), the difference in the number of views is rather startling. Without access to accurate stats (which are not public), it is tough to know how many YouTube views came from people viewing the video embedded into the Change.gov site, searching YouTube, or watching a copy embedded into a personal blog or other news site.
However, I do think it is fairly reasonable to assume that a decent percentage of those nearly 1 million views came from people visiting Change.gov, the taxpayer-funded, official site of the Obama transition team. It is those hundreds of thousands of viewers who clicked the play button to load and stream a video embedded from YouTube's servers that are the focus of this post.
Privacy risks
YouTube, like many other sites, uses persistent cookies to track repeat visitors. Thus, when a regular YouTube user views a video embedded in a blog or other third-party site, the user's cookie is automatically sent to YouTube's servers--even without the user clicking the play button. Given the widespread use of embedded videos, this gives Google, which owns YouTube, an even better idea of the surfing habits of millions of people around the world.
And even if you believe Google's "do no evil" motto, it seems at least a little bit creepy for the company to track each time someone visits Change.gov--especially when that person doesn't actually press the play button to watch Obama's latest message to the people.
The privacy risks associated with the widespread use of embedded videos is something that has caused significant concern for privacy activists--enough for the folks at the Electronic Frontier Foundation to develop the privacy-preserving MyTube tool for Webmasters. If the Obama team insists on sticking with YouTube embeds, perhaps it will at least consider deploying MyTube to protect the privacy of citizens who visit the official transition site.
The privacy risks aren't just limited to YouTube.
Just a week ago, Dan Goodin at The Register criticized the use of the Google Analytics Web-tracking code in the Change.gov site--which also sets a permanent tracking cookie. Although he mostly focused on security risks, and not privacy-related threats, he blasted Obama's Web design team, stating that:
The failure of Obama's Webmasters to follow anything remotely like best practices is more than a little troubling because it suggests they don't fully grasp the security realities of living in a Web 2.0 world.
Eight years ago, the issue of cookies tracking users on government sites was a fairly big issue in tech policy circles, drawing the attention of those in Congress. Eventually, the Office of Management and Budget issued a directive that forbid the use of persistent cookies on federal agency sites.
The Obama team's use of both YouTube and Google Analytics raises serious privacy concerns and likely clashes with the OMB directive.
If Obama's transition team can afford to lease a jet for the president-elect and to pay for staff salaries, BlackBerrys, and hotel rooms, why can't it also pay for a few Web servers capable of serving up Flash video?
(Credit: Change.gov)To be clear, Change.gov is not creating or requesting its own persistent cookies. However, due to the embedding of YouTube videos and Google Analytics Web-tracking code in the site, visitors will be transmitting cookies to Google's servers. Since the YouTube cookies are not set directly by the Change.gov servers, it is unclear whether the Google cookies violate the specific OMB directive. Even if they do not, they clearly violate the intention of the rule--which was created in the days before embedded videos or third-party-hosted Javascript.
The official privacy policy listed at Change.gov makes no mention of cookies, nor of the collection of visitor information by Google's servers. The privacy policy does, however, pledge "not to make personal information available to anyone other than our employees, staff, and agents." At best, the Obama team copied a boilerplate privacy policy from somewhere else and overlooked the use of YouTube and Google Analytics. At worst, it seems pretty deceptive.
When reached for his thoughts, Marc Rotenberg, executive director of the Electronic Privacy Information Center told me:
On the upside, the transition people have done a good job with the ethics in government rules for transition team members. Now they need to revise the Change.Gov Web site and respect the rights of citizens who are seeking information about the new administration.
Lots of traffic
The low-quality video YouTube video embedded into the Change.gov blog is 7MB. When multiplied by more than 900,000 views, we find out that Obama's first video led to the consumption of over 6 terabytes of bandwidth. If the Obama team had to pay for the data, instead of getting it for free from YouTube, it would have cost nearly $1,000, at least if it used Amazon.com's S3 cloud-hosting service.
While YouTube did not serve any advertisements within or around Obama's chat, each of those 900,000+ viewers did see YouTube's name prominently placed within the Change.gov site (as a watermark in the bottom corner of the video). Once the three-minute video is over, viewers are given the ability to watch other related videos (which might have advertisements) or, with one click, to navigate directly to the Google-owned video-sharing site, which certainly has advertisements.
Furthermore, I'm sure that Google's PR team was absolutely overjoyed with the thousands of newspaper articles that flatteringly tied the president-elect to the video-sharing platform. While all press is good press, it is likely such Obama-related press is even better.
Defaults matter
The Obama team's uploading of its weekly videos to YouTube is fine--providing, as it currently does, that it also uploads the videos to a few other places too. As the videos are not copyrighted, members of the public are free to redistribute them via other platforms (as the LegalTorrents P2P site has done), and even mash them up. This is great, and I support this embrace of Internet distribution by the president-elect's team of geeks.
I do, however, have a problem with the use of YouTube-hosted embedded videos on the official Change.gov site.
The transition team has a budget of over $12 million. If it can afford to lease a jet for Obama and to pay for staff salaries, BlackBerrys, and hotel rooms, why can't it also pay for a few Web servers capable of serving up Flash video? Isn't it a bit tacky for the federal government to be relying on Google to host its videos?
It's as if the entire Obama transition team has adopted Hotmail's free e-mail service for its daily communications--with each e-mail sent by an Obama adviser followed by a signature pitching one of Microsoft's products: "See how Windows Mobile brings your life together--at home, work, or on the go."
Obama raised half a billion dollars through online donations during his campaign. His was the first presidential campaign to employ a chief technology officer (a computer geek formerly at the travel site Orbitz). These guys know what they're doing when it comes to technology; they design beautiful, interactive sites and have relied upon complex data-mining algorithms to profile and target individual voters and donors. If they wanted to, they'd have no problem installing a few dozen Adobe Systems Flash streaming servers. However, since YouTube will gladly foot the bill, the Obama team hasn't felt the need.
During his campaign for the presidency, Obama didn't call for a Web 2.0 government, but for a Google government--something that CEO Eric Schmidt, who is now serving as one of Obama's economic advisers, was probably very happy to hear. While I love conspiracy theories as much as the next guy, I don't really see one here. However, given the close connection between Obama and several higher-ups at Google, it is better to avoid the appearance of a conflict of interest.
Thus, it is time to bring an end to embedded YouTube videos on Change.gov. By all means, use streaming video to reach the masses, but let the bits flow from government-owned servers (preferably without privacy-invading cookies). If bloggers wish to embed YouTube videos of the speech on their own sites, that is fine. But Obama shouldn't.
Disclosure: I was a technology fellow at the Electronic Privacy Information Center in spring 2008 where I worked on social-networking-related issues. I also worked for Google as a summer intern in 2006, received two Google fellowships, and currently use Google Analytics tracking tool for my personal site.
Calling for the separation of Google and State.
The news that President-elect Barack Obama will be using YouTube to distribute his weekly "radio" address has been met by general fanfare among the digerati.
This might seem like a bold move--and compared with the relatively boring podcast MP3s of Bush's weekly speech hosted at Whitehouse.gov, it is. However, putting President-elect Obama's video podcasts on YouTube is hardly Change We Can Believe In.
By exclusively hosting his videos at YouTube, the Google-owned dominant player in the user-generated video industry, the Obama campaign has effectively issued its first no-bid giveaway of the next administration.
If Obama really wants to demonstrate his Web 2.0 bona fide intent and prove that he's actually interested in shaking things up, he'll use BitTorrent, the disruptive file-sharing tool that arguably dwarfs YouTube in popularity.
Let's explore a few reasons why Obama should ditch his YouTube plans and switch to BitTorrent:
- As demonstrated by the recent flood of constituent complaints to the House and Senate during the banking bailout, the .gov network simply can't deal with lots of traffic.
- It's not the government's role to pick industry winners and losers. Sure, YouTube has millions of users, but I'm sure that the other Silicon Valley-based user-submitted video sites would love to draw the eyeballs of Obama's podcast subscribers. What about Veoh, Vuze, Hulu, Revver, and Blip.tv?
- While it's awfully nice of Google-YouTube to volunteer the hundreds of gigabytes of bandwidth necessary to host Obama's video content, is it really appropriate to further expand the link between Google and the Obama Whitehouse?
Google CEO Eric Schmidt already has Obama's ear as a member of his economic advisory board; the Obama campaign has likely paid hundreds of thousands of dollars to Google for AdWords advertising during the campaign; and Google.org's Sonal Shah has landed a key key role on Obama's transition committee. Simply put, things are already close enough between Change.gov and the Google Gang. - There are no copyright issues--since the videos will be made by the federal government, they are automatically in the public domain. Thus, it is perfectly OK for them to be shared via peer-to-peer technologies.
- It'd give Obama a reason to care about Net neutrality. Some on the left are already voicing fears that Obama will soften on his commitment to the Net neutrality cause. Once his weekly addresses are hosted via BitTorrent, he'll have a vested interest in keeping the pipes tamper free. In such a scenario, any antifile-sharing shenanigans by Comcast or other ISPs would directly impact Obama's ability to speak to the people.
- The Canadians already do it: CBC--Canada's version of PBS--has had highly successful trials of BitTorrent as a low cost, high-throughput method of distributing video content. Since we're hopefully going to copy the Canadian's obviously better health care system, why not similarly learn from their use of file sharing?
The time is right for the U.S. government to adopt BitTorrent. Mr. Obama, be bold, be brave, and upload to The Pirate Bay.
A tip of the hat to Aaron Shaw, who inspired this blog post in a conversation earlier today.
Question: You're a multibillion dollar tech giant, and you've launched a new phone platform after much media fanfare. Then a security researcher finds a flaw in your product within days of its release. Worse, the vulnerability is due to the fact that you shipped old (and known to be flawed) software on the phones. What should you do? Issue an emergency update, warn users, or perhaps even issue a recall? If you're Google, the answer is simple. Attack the researcher.
With the news of a flaw in Google's Android phone platform making The New York Times on Friday, the search giant quickly ramped up the spin machine. After first dismissing the amount of damage to which the flaw exposed users, anonymous Google executives then attempted to discredit the security researcher, Charlie Miller, who's a former NSA employee turned security consultant. Miller, the unnamed Googlers argued, acted irresponsibly by going to The New York Times to announce his vulnerability instead of giving the Big G a few weeks or months to fix the flaw:
Google executives said they believed that Mr. Miller had violated an unwritten code between companies and researchers that is intended to give companies time to fix problems before they are publicized.
What the Googlers are talking about is the idea of "responsible disclosure," one method of disclosing security vulnerabilities in software products. While it is an approach that is frequently followed by researchers, it is not the only method available, and in spite of the wishes of the companies whose products are frequently analyzed, it is by no means the "norm" for the industry.
Another frequently used method is that of "full disclosure"--in which a researcher will post complete details of a vulnerability to a public forum (typically a mailing list dedicated to security topics). This approach is often used by researchers when they have discovered a flaw in a product made by a company with a poor track record of working with researchers--or worse, threatening to sue them. For example, some researchers refuse to provide Apple with any advanced notification, due to its past behavior.
A third method involves selling information on the vulnerabilities to third parties (such TippingPoint and iDefense)--who pass that information on to their own customers, or perhaps keep it for themselves. Charlie Miller, the man who discovered the Android flaw, has followed this path in the past, most notably when he sold details of a flaw in the Linux kernel to the U.S. National Security Agency for $50,000 (PDF).
Google's poor track record
First, consider the fact that security is a two-sided coin. If Google wants researchers to come to it first with vulnerability information, it is only fair to expect that Google be forthcoming with the community (and the general public) once the flaw has been fixed. Google's approach in this area is that of total secrecy--not acknowledging flaws, and certainly not notifying users that a vulnerability existed or has been fixed. Google's CIO admitted as much in a 2007 interview with The Wall Street Journal:
Regarding security-flaw disclosure, Mr. Merrill says Google hasn't provided much because consumers, its primary users to date, often aren't tech-savvy enough to understand security bulletins and find them "distracting and confusing." Also, because fixes Google makes on its servers are invisible to the user, notification hasn't seemed necessary, he says.
Second, companies do not have a right to expect "responsible disclosure." It is a mutual compromise, where the researchers provide the company with advanced notification in exchange for some form of assurance that the company will act reasonably, keep the lines of communication open, and give the researcher full credit once the vulnerability is fixed.
Google's track record in this area leaves much to be desired. Many top-tier researchers have not been credited for disclosing flaws, and in some cases, Google has repeatedly dragged its feet in fixing flaws. The end result is that many frustrated researchers have opted to follow the full-disclosure path, after hitting a brick wall when trying to provide Google with advanced notice.
I can personally confirm this experience, after I discovered a fairly significant flaw in a number of commercial Firefox toolbars back in 2007. While Mozilla and Yahoo replied to my initial e-mail within a day or so and kept the lines of communication open, Google repeatedly stonewalled me, and I didn't hear anything from them for weeks at a time. Eventually, Google fixed the flaw a day or two after I went public with the vulnerability, 45 days after I had originally given the company private notice. As a result, I have extreme sympathy for those in the research community who have written Google off.
A rather unimpressive vulnerability
Once we actually look into the details of the vulnerability, and Miller's disclosure, the situation looks even worse for Google.
A known vulnerability: The Android platform is built on top of more than 80 open-source libraries and programs. This particular flaw had been known about for some time and already fixed in the current version of the open-source libraries. The flaw in Google's product only exists because the company shipped out-of-date software, which was known to be vulnerable.
Advanced notice: While the anonymous Google executives criticized Miller for not following responsible disclosure practices, it is worth noting that the researcher did provide Google with early notice--informing the company on the 20th of October. It is also important to note that Miller and his colleagues have yet to actually provide full information on the vulnerability or a working proof-of-concept exploit to the security community. Thus, it can hardly be said that Miller followed the full-disclosure path.
If Google can criticize Miller at all, it cannot be for not warning the company, but perhaps for not providing them with enough warning. However, given that Google shipped known-vulnerable software to hundreds of thousands of users, and that fixed versions of the vulnerable software packages have been available for some time, it is difficult for this blogger to sympathize with the folks in Mountain View.
Furthermore, given Mr. Miller's previous mercenaryish history of selling software vulnerabilities to the National Security Agency (which presumably used the flaws to break into foreign government computers, and not in order to fix the vulnerable software), we should be happy that he is at least now sharing the existence of this flaw with the public. At least this way, developers have a good chance of finding and fixing it.
Disclosure: In the summer of 2006, I worked as an intern for the Application Security Team at Google. Furthermore between 2003-2005, I was a student at Johns Hopkins University and was advised by Prof. Avi Rubin, who is one of the founders of Independent Security Evaluators, the company that employs Charlie Miller. A couple of my former colleagues also now work for ISE. I have not spoken with them (or anyone at Google) about this article.
John McCain's presidential campaign has discovered the remix-unfriendly aspects of American copyright law, after several of the candidate's campaign videos were pulled from YouTube.
McCain has now discovered the rights holder friendly nature of the Digital Millennium Copyright Act, which forces remixers to fight an uphill battle to prove that their work is a "fair use."
However, instead of calling for an overhaul of the much hated law, McCain is calling for VIP treatment for the remixes made by political campaigns.
McCain's proposal: complaints about videos uploaded by a political campaign would be manually reviewed by a human YouTube employee before any possible removal of the remix. The process for complaints against videos uploaded by millions of other Americans would stay the same: instant removal by a computer program, and then possible reinstatement a week or two later after the video sharing site has received and manually processed a formal counter-notice.
With 11 homes and 13 cars, it's not terribly surprising that McCain is calling for special treatment for the YouTube videos of politicians. As for the "fair use" claims of the poor starving masses: Let them eat cake.
On Tuesday, the McCain campaign sent a formal letter to YouTube asking for this two-tier system for "fair use" complaints. Copyright-guru Larry Lessig called it a "fantastic letter", adding "bravo to the campaign" in a post to his blog. Since then, the technology press has been pretty supportive, although the focus of the coverage seems to mainly be along the lines of "McCain realizes that fair use claims are uphill battle." This is the wrong message to send, and as much as I respect Professor Lessig, I have to call him out here. He is wrong. McCain should be criticized for his attempt to get special treatment, and Google/YouTube need to treat all users the same way.
All claims of fair use are equal--yet some claims are more equal than others.
The only way we will get an effective overhaul of copyright laws will be by forcing politicians to suffer along with the masses. The minute a special set of rules are made for those in Congress, the incentive to fix the system will disappear. To drive this point home, consider the following:
During the confirmation hearings for Judge Robert Bork, the Washington City Paper obtained a copy of the Republican nominee's video rental records. Alarmed at the possibility that their own rental histories would be revealed by the press, members of Congress jumped to pass comprehensive privacy legislation for the video rental records of all Americans. Up until the Bork fiasco, there had been no real incentive to fix anything, but once the risk to their own records was made clear, Congress acted. As a result, we are now all protected by the 1988 Video Privacy Protection Act.
Compare this to the horrible situation at airports. Americans are routinely harassed, prodded, poked and humiliated by employees of the Transportation Security Administration. While we stand in line like sheep, congressmen get to skip through the security lines, avoiding the entire process. Given the fact that they don't have to suffer at the hands of TSA, it's not terribly surprising that they have little incentive to fix the problems faced by the rest of us.
These two examples should make it clear--we cannot allow politicians to receive special treatment in copyright and fair use disputes. If anything, campaign videos should receive substandard treatment. McCain's videos deserve to rot in purgatory at the back of the DMCA queue, behind videos of toddlers, skateboarding dogs, Starwars Kid remixes, and the hundreds of clips of the dramatic chipmunk. Perhaps then, the senator will throw his weight behind comprehensive copyright reform that'll result in real benefits for the rest of the remix-population.
Google announced on Monday that the company will be reducing the amount of time that it will keep sensitive, identifying log data on its search engine customers. To the naive reader, the announcement seems like a clear win for privacy. However, with a bit of careful analysis, it's possible to see that this is little more than snake oil, designed to look good for the newspapers, without delivering real benefits to end users.
In a post to the company blog on Monday, the company announced that it will be significantly reducing the amount of time that it hangs onto identifying user data in its Web server logs:
Today, we're announcing a new logs retention policy: we'll anonymize IP addresses on our server logs after 9 months. We're significantly shortening our previous 18-month retention policy to address regulatory concerns and to take another step to improve privacy for our users.
Hidden further down in the blog post, were a few more details:
We haven't sorted out all of the implementation details, and we may not be able to use precisely the same methods for anonymizing as we do after 18 months, but we are committed to making it work.
Google's announcement was extremely light on details, specifically, how the company planned to anonymize the records after 9 months. I contacted Google to find out more, and received an extremely interesting reply:
After nine months, we will change some of the bits in the IP address in the logs; after 18 months we remove the last eight bits in the IP address and change the cookie information. We're still developing the precise technical methods and approach to this, but we believe these changes will be a significant addition to protecting user privacy.... It is difficult to guarantee complete anonymization, but we believe these changes will make it very unlikely users could be identified.... We hope to be able to add the 9-month anonymization process to our existing 18-month process by early 2009, or even earlier.
To understand what this means (and how useless the new privacy "enhancements" are), consider the following:
When a user conducts a search using Google's search engine, the company stores three main types of information in a log file: the user's IP address (which is a unique network address given to her computer by her Internet service provider), the words that she searched for, and her cookie identifier (a unique value given to every Web-browser that visits a Google Web-property).
As per Google's existing policy, after 18 months Google "anonymizes" the IP address and cookie information from its logfiles. While the company hasn't said how it de-identifies the cookies, it has revealed in public statements that its IP anonymization technique consists of chopping off the last 8 bits of a user's IP address.
As an example, an IP address of a home user could be 173.192.103.121. After 18 months, Google chops this down to 173.192.103.XXX.
Since each octet (the numbers between each period of an IP) can contain values from 1-255, Google's anonymization technique allows a user, at most, to hide among 254 other computers. In comparison, Microsoft deletes the cookies, the full IP address and any other identifiable user information from its search logs after 18 months.
Google has now revealed that it will change "some" of the bits of the IP address after 9 months, but less than the eight bits that it masks after the full 18 months. Thus, instead of Google's customers being able to hide among 254 other Internet users, perhaps they'll be able to hide among 64, or 127 other possible IP addresses.
By itself, this is a laughable level of anonymity. However, it gets worse.
First, remember that Google will not delete or anonymize user cookies from the logs when it slightly smudges IP addresses after nine months. Second, remember that as long as you use a Google Web property at least once every two years, the company will maintain a unique identifiable cookie value within your Web browser.
Thus, consider the following scenario:
In June 2008, a user from 173.192.103.121 with cookie value 12345 conducts a search for "breast cancer risks." Nine months later, in March 2009, the company scrubs some portion of the IP address, perhaps to 173.192.103.1XX. However, the cookie remains in the log.
In April 2009, that same user returns to Google, and conducts a search for "stephen colbert youtube videos," again from the same IP and the same cookie value 12345.
Even though the 9-month-old search logs have been "anonymized", because the cookie values remain, it is trivial to match the newer search results to the older searches, and thus completely reverse the anonymization process.
The simple truth is that any IP anonymization technique, no matter how strong or weak, is simply a waste of time, if cookie values are not also anonymized.
Unfortunately, Google is relying on the fact that the mainstream media (I'm looking at you New York Times and Washington Post) are clueless on these issues, as well as seemingly most of the technology press. Google's new anonymization policy is totally worthless, and the company deserves to be called out for its deception.
Disclaimer: I interned at Google during the summer of 2006 and received a $5,000 Google fellowship in both 2006 and 2007. I have also interned or worked for both the Electronic Privacy Information Center (EPIC) and the American Civil Liberties Union (ACLU) of Northern California, public interest groups that have been extremely critical of Google's privacy policies.
European regulators sent shock-waves through the search engine industry earlier this week, when they proposed significantly tighter rules for logging data. If the EU adopts the proposed rules, Google, Yahoo and Microsoft will have to significantly reduce the amount of time they keep identifying search logs, and will have to start treating IP addresses as personally identifiable data -- something that Google has been particularly vocal against.
Google has recently engaged in a major public relations effort to try and make a credible argument for keeping log data. The company has trotted out respected employee researchers to try and make the case that deleting such data will hurt search results. When all of their claims are analyzed, however, one thing becomes clear: It's all about the money (and the clicks).
Google has a genuine need to retain detailed log information on one kind of user: Those who click on ads. However, in order to avoid creating a situation where only clickers lose their privacy, the company logs data on all searchers instead. That is, the privacy of millions is threatened, to protect the incentive for users to click on ads.
The excuses
Over the last few months, a number of Google's engineers have issued public statements on the company's public policy blog to defend its much criticized log data retention policies. The company claims that the data can be used to hunt down malware, to catch people defrauding its advertising system, and can be used to improve search results, especially for localized results.
Google claims that accurate logging data can improve localized searches. This data is then used to intelligently respond to searches, such that a search for "GM" will result in General Motors related information for an American search user, yet someone in France be presented with information on "Guerre Mondiale" (World War).
What Google has done here, is attempt to muddy the waters of the debate. Yes, accurate logging data improves localized searches. However, the company does not need to retain the exact network address (known as an IP address) of each and every search. Instead of tracking my searches by my network address, 129.53.136.23, the company could instead log that I came from San Francisco, California. That, in itself, would be more than enough information in order to help it localize and improve search results.
Avoiding disincentives
Of all the excuses that Google's puppets have presented for retaining search logs, there is only 1 case where Google actually has a legitimate need to store information that identifies the individual user, and network address: advertising clicks.
Google is an advertising company first, and a search engine second. Sometimes, we forget this, but Google has a lot of bills to pay. After all, those free meals and massages for employees have to be paid for somehow.
Google displays text advertisements on all of its web search results pages. Advertisers, for the most part, pay per click. That is, every time a user clicks on one of the ads, Google charges an advertiser a few cents (or dollars, depending on the search term). Because of the amounts of money at play, this tends to attract criminals wishing to defraud the system. Thus, it is not terribly surprising that Google wishes to retain information on the user who clicked.
What is most interesting to note though, is that if a user does not click on one of Google's web advertisements, the only credible reason for retaining detailed search information becomes moot. If a user doesn't click, they can't possibly be engaged in fraud, and thus there is no reason to retain identifying information on the user's search.
Were Google to institute an information needs based logging policy, it would find itself in a curious position: users who clicked on advertisements would have detailed logs retained for months, if not years, while users who didn't click on ads would quickly have any identifying information scrubbed from logs, and replaced with more generalized info.
The obvious problem with such a scenario would be that of incentives, especially once the policy was made public. Users would lose their privacy each time they clicked on an advertisement. Unfortunately for the company, this is exactly the wrong kind of message to send. It wants to encourage users to click on its text ads, not to provide incentives for customers to skip them.
Thus, in order to not create that situation, and to avoid the disincentive to click on ads, Google logs data on every search, by every user. And because of this, we all suffer -- even those users who never even see ads, because they use technologies like AdBlockPlus and CustomizeGoogle.
Disclaimer: In 2006, worked as a summer intern in Google's click fraud team. Shuman Ghosemajumder, Google's "Business Product Manager for Trust & Safety" and the person claiming that search logs prevent fraud worked in the same team.
None of the information in this blog post involves confidential company information.
I was awarded a Google fellowship in both 2006 and 2007, for $5000 each time. Finally, I just returned from a Scholar Retreat in San Francisco, which the company paid for.





