August 7, 2006 9:59 AM PDT

AOL apologizes for release of user search data

AOL apologized on Monday for releasing search log data on subscribers that had been intended for use with the company's newly launched research site.

The randomly selected data, which focused on 658,000 subscribers and posted 10 days ago, was among the tools intended for use on the recently launched AOL Research site. But the Internet giant has since removed the search logs from public view.

"This was a screw-up, and we're angry and upset about it. It was an innocent enough attempt to reach out to the academic community with new research tools, but it was obviously not appropriately vetted, and if it had been, it would have been stopped in an instant," AOL, a unit of Time Warner, said in a statement. "Although there was no personally identifiable data linked to these accounts, we're absolutely not defending this. It was a mistake, and we apologize. We've launched an internal investigation into what happened, and we are taking steps to ensure that this type of thing never happens again."

Although AOL had used identification numbers rather than names or user IDs when listing the search logs, that did not quell concerns of privacy advocates, who said that anyone among the 658,000 could easily be identified based on the searches each individual conducted.

"We think it's a major privacy concern, and we're glad to see AOL is taking it seriously," said Ari Schwartz, deputy director of the Center for Democracy and Technology. "Companies that deal in search results have to understand that they carry very sensitive information, even if it doesn't have what we would traditionally consider to be personally identifiable information involved."

Schwartz and other privacy advocates noted that with bits of information, a "mosaic" could be created that could eventually lead a person to identify the individual in question.

related story
AOL offers glimpse into users' lives
Release of three-month search histories of some 650,000 users provides rare glimpse into their private lives.

"Sometimes what people are searching for may be an indicator of who they are and who they know," said Richard Smith, founder of Internet security and privacy consulting firm Boston Software Forensics.

In one search log, terms such as "how to tell your family you're a victim of incest," "casey middle school," "surgical help for depression," "can you adopt after a suicide attempt," "Fishman David Dr - 2.6 miles NE - 160 E 34th St, New York, 10016 - (212) 731-5345," "gynecology oncologists in new york city," and "how long will the swelling last after my tummy tuck" appeared in the set of data. (To see a more extensive account of search phrases surfaced in the AOL data, click here.)

Some researchers, however, contend the information serves a valuable purpose in helping to develop better information retrieval technology.

"Researchers at universities or small companies don't have access to this type of data. I think the (AOL) researchers were trying to do a good thing by making this available to the research community," said Steve Beitzel, who holds a doctoral degree in computer science from the Illinois Institute of Technology with a specialization in information retrieval. Beitzel, who is an affiliated researcher with the university's Information Retrieval Lab, once served as an intern at AOL, but was not involved with the release of the search log data.

In developing his doctoral thesis, Beitzel used another set of search data from AOL, unrelated to this recent issue, that focused on tracking trends in search query strings.

"It's a hot...research problem that people are trying to solve," he said.

Beitzel noted that the former Excite released a smaller data set of its users' search results in 1999 and 2001, and AltaVista engaged in a similar situation about five or six years ago.

Excite, as well as AltaVista, withheld the user's name and IP address and used an anonymous identifier.

"They released the data sets more than five years ago, and it hasn't hurt anyone," Beitzel said. "The bloggers say what AOL did was evil and a violation of privacy. But this may be an overreaction...a nine-digit number in a search box with no name attached is meaningless."

Kurt Opsahl, a staff attorney for the Electronic Frontier Foundation, pointed to other means to make the information available to the research community without making it open to the public.

"There are ways of conducting research into search technology, without making individuals' search terms public," Opsahl said. "Universities could abide by AOL's privacy laws and various laws for privacy...They could get consent from users before handing out the information to third parties."

While Beitzel agreed other methods could be enacted to aid researchers and the search community, he advised against issuing filters to screen out information such as names or Social Security numbers.

"If you alter the collection, then it is no longer representative," he said.

The release of the search logs runs counter to a court ruling in March, when a federal judge rejected efforts by the Department of Justice to gain access to Google users' search logs. The court, however, determined the Justice Department could have limited access to Google's index of Web sites.

Google was the only search engine to fight the Justice Department, with Yahoo, Microsoft's MSN and AOL turning over their users' search data.

"All search engines collect this kind of user data, and it's valuable to marketers, insurance companies, people involved in divorce and custody battles," said Rebecca Jeschke, a spokeswoman for the EFF. "If this information is available, there is a lot of temptation to release it."

Smith, meanwhile, noted the information AOL provided is similar to the type of search string information the Justice Department sought, under the Children's Online Protection Act.

The search log data, culled from March to May, represents approximately 1.5 percent of AOL's search network in May. The data applied only to U.S. searches by AOL subscribers using the company's client software.

A number of blogs are pointing to mirror sites to let people take a peek at the search logs of AOL users.

CNET News.com's Declan McCullagh contributed to this report.

See more CNET content tagged:
information retrieval, Ari Schwartz, America Online Inc., researcher, Time Warner Inc.

8 comments

Join the conversation!
Add your comment
Is this a bit too late
The file has been circulated the net, and is being discussed and analzyed. Forum such as aolsearchlogs.com is setup.
Posted by ptkventures (1 comment )
Reply Link Flag
Hopefully!
Hopefully this is another nail being self driven into the coffin of THE worst Internet company ever.
Posted by firesuite (10 comments )
Reply Link Flag
The worst Internet company ever?
I'm not familar with every Internet company in the world. If you
are, then perhaps you are correct, perhaps not.

But, IMO, it is a safe assumption to say they are one of the two
worst. :)
Posted by rcrusoe (1305 comments )
Link Flag
AOL will use the Kaiser Permanente strategy
Since the data has been widely mirrored, AOL will next find a scapegoat so the public will be more worried about those villains that dared to point out the problem and mirror the evidence.

Here is the instant recipe:

1) PR department reaches out to their media contacts. Journalists then tell sensationalist story of "hackers" or "bloggers" who mirrored *your* private data. AOL worms out of responsibility for letting the data loose in the first place by declaring war on the evil bloggers.

2) Now that there's no public support for the blogger, AOL can safely trick a government agency into publicly denouncing the blogger. Since the blogger is clearly a danger to public safety, the government is allowed to ignore all applicable law. After all their heart was in the right place, and that matter's more than an individual's rights. Also, since the press is already committed to portraying the blogger as a villain, the government knows that they will never have to apologize if they make a mistake. The press has a vested interest not to report the error.

3) Next AOL's team of corporate lawyers will file a lawsuit. It doesn't matter if the lawsuit is frivolous - they are after the PR value of "prosecuting on behalf of the public", and reinforcing to the media that the blogger who dared link to the info is the evil one. If the blogger is poor, weak, and has no media platform of their own, then AOL might actually win the lawsuit by default, adding further legitimacy to their "public defender" posture.

4) The public doesn't understand that killing the messenger only guarantees successful cover ups in the future. And as far as I can tell, they don't care that there is a layer of people who corporations can calculate as having no Constitutional rights in this country (if a person can't defend their rights, they might as well not exist). AOL's "issues management" team is weaving these assumptions into their strategy.

Scapegoating worked for Kaiser Permanente. It'll work for AOL.
Posted by teraphim (7 comments )
Reply Link Flag
Proofing
It's "peek", not "peak". Does anyone proof these articles besides using spellcheck? Major news article, lots of hits, and you lost the mental majority right at the end.
Posted by agutgopostal (1 comment )
Reply Link Flag
please...learn about search proxies
THere are many and most are free

heres one.

<a class="jive-link-external" href="http://www.blackboxsearch.com" target="_newWindow">http://www.blackboxsearch.com</a>
Posted by talledega500 (23 comments )
Reply Link Flag
Field Day for Hackers
"Although there was no personally identifiable data linked to these accounts....."
This is an absolute lie. I did random searches in the database and just 10 minutes ago was able to sign into someones admin account for a forum on a well know website. Also sitting in front of me are a bunch of name, YES REAL NAMES, first and last with their corresponding "anonymous" user id's. This is unbelievable!
Posted by mlssry (1 comment )
Reply Link Flag
Download the data here
Download the data here:
<a class="jive-link-external" href="http://mysite.verizon.net/laurin99/aol.htm" target="_newWindow">http://mysite.verizon.net/laurin99/aol.htm</a>
Posted by mastershake_phd (1 comment )
Reply Link Flag
 

Join the conversation

Add your comment

The posting of advertisements, profanity, or personal attacks is prohibited. Click here to review our Terms of Use.

What's Hot

Discussions

Shared

RSS Feeds

Add headlines from CNET News to your homepage or feedreader.