April 23, 2004 4:00 AM PDT

Google's chastity belt too tight

PartsExpress.com proudly touts itself as the Net's No. 1 source for audio, video and speaker components--but online shoppers who rely on an optional feature in the Google search engine to block porn sites would never know it.

News.context

What's new:
Despite claims of "advanced proprietary technology," Google's opt-in porn filter proves no better than the tools of the last decade, blocking many harmless sites, a CNET News.com investigation shows.

Bottom line:
The indiscriminate nature of the tool is bad news for affected businesses. Google is the most widely used search engine, and failure to appear in its listings can have a direct impact on sales for some companies, particularly smaller enterprises with limited marketing budgets.

More stories on this topic

By an accident of spelling, the domain name of the Ohio electronics retailer includes an unfortunate string of letters, "sex," which is enough to block the Web site from Google's filtered results.

PartsExpress.com is not alone. A CNET News.com investigation shows that Google's SafeSearch filter technology incorrectly blocks many innocuous Web sites based solely on strings of letters such as "sex," "girls" or "porn" embedded in their domain names.

Google's SafeSearch flaws are more than academic--they can have serious consequences for innocent Web site operators blocked out by them. Google is the most widely used search engine on the Web, and failure to appear in its listings can have a direct impact on sales for some companies, particularly smaller enterprises with limited marketing budgets.

Research company WebSideStory reported last month that Google claimed an all-time high in search referrals, 41 percent of the United States total, and the search giant's market share is steadily expanding.

"Traffic from Google can make or break a business," said Maria Medina, whose family-run clothing business at ALittleGirlsBoutique.com doesn't pass the SafeSearch censor. "Here I am, a mom of four children, creating an at-home business that sells little girl dresses and accessories, in order to spend more time with my children, and I have been filtered out as not being family friendly. Ridiculous."

Matt Cutts, the Google engineer who designed SafeSearch four years ago, said his algorithm looks for a "relatively small" number of trigger words in a Web page's address. If one of those words appears, the SafeSearch algorithm puts the address on a block list and does not take the next step of evaluating the content of the site. "We try to find the best trade-off of precision, recall and safety," Cutts said. "People who opt in to SafeSearch are mostly OK with us being on the conservative side."

Cutts would not disclose how many Web searches are done with SafeSearch enabled, saying only that it's a small percentage of the millions of queries handled by Google each day. But the sloppy filter stands out as a rare black eye for a company that prides itself on superior search technology and boasts on its payroll one of the world's highest concentrations of computer science doctoral degrees. Google claims SafeSearch "uses advanced proprietary technology that checks keywords and phrases" and filters out only Web pages "containing pornography and explicit sexual content."

"That's not very bright," said Karen Schneider, a librarian who runs the Librarians' Index to the Internet and has made a study of filtering software. SafeSearch is "certainly evocative of the very primitive CyberSitter-type tools of the mid-1990s--not a tool of fairly sophisticated development."

The Scunthorpe problem
For years, Web content filters have drawn criticism for inaccuracies. In a famously embarrassing incident in 1996, America Online's errant dirty-word filter prevented residents of the British town Scunthorpe from signing up as new customers. Google's SafeSearch makes the same mistake, blocking local news sites like ThisIsScunthorpe.co.uk and ScunthorpeDistrictCatsProtection.co.uk, a housecat-adoption site.

SafeSearch is "evocative of the very primitive CyberSitter-type tools of the mid-1990s--not a tool of fairly sophisticated development."
-- Karen Schneider, a librarian who runs the Librarians' Index to the Internet
Other Web sites misidentified by SafeSearch because of "sex" in their domain names include ArkansasExtermination.com, which claims to offer the "best in termite and pest control." The owner of the business, who declined to give his name, said he was puzzled by Google's categorization: "My brother wrote the Web site. I don't know anything about that."

SafeSearch also marked as unsafe for children JewishSussex.com, a religious Web site; EssexCountyBeeKeepers.org of Topsfield, Mass.; BluesExcuse.SouthBurnett.com.au, an Australian blues band's site; BassExpert.com; and the Anglo-Saxon history site RomansInSussex.co.uk.

Gareth Roelofse, the Web designer of RomansInSussex.co.uk, said his filtering complaints are broader than just Google. "We also found many library Net stations, school networks and Internet cafes block sites with the word 'sex' in" the domain name, Roelofse said. "This was a challenge for RomansInSussex.co.uk because its target audience is school children."

"I think it would be nice if Google would have a 'white list' for sites like ours, but this would involve human man-hours, I guess," said Roelofse, who designed the site on behalf of the Sussex Archaeological Society and local museums.

Cutts, the Google software engineer, noted that the SafeSearch Web page permits visitors to contact the company with complaints. "In most cases it's a pretty unambiguous usage," Cutts said about the word "sex" in domain names and Web addresses. "No filter can be 100 percent accurate. We're always willing to take a fresh look at our filter and see how we can improve it."

Google is not alone in seeking to lure searchers worried about encountering online raunch and ribaldry: Yahoo offers a "mature Web content" search filter, and Ask Jeeves has set up a separate Web site for kid-friendly searches. But Yahoo's filter isn't as hypersensitive as Google's, and lists domains mentioning Sussex, Essex and Scunthorpe as acceptable.

The flaws in Google's filter have persisted despite research published about a year ago that highlighted overblocking in SafeSearch.

An April 2003 report from Harvard University's Berkman Center described similar but less extensive problems with SafeSearch. That report said some news articles and political Web sites were filtered.

David Drummond, Google's vice president for business development, said that at the time of its development, SafeSearch was designed to be overly cautious. "The thinking was that SafeSearch was an opt-in feature," Drummond said. "People who turn it on care a lot more about something sneaking through than they do about something getting filtered out."

"Plainly silly" blocking
CNET News.com evaluated SafeSearch by testing tens of thousands of random Web pages and identifying which ones were incorrectly listed as pornographic. The results showed that Google encountered many of the same problems that have plagued Internet filters for almost a decade. One 1996 analysis, for instance, showed that CyberPatrol blocked National Rifle Association and gay and lesbian Web sites, and CyberSitter cordoned off Usenet newsgroups such as alt.feminism and soc.support.fat-acceptance.

"People who opt in to SafeSearch are mostly OK with us being on the conservative side."
-- Matt Cutts, the Google engineer who designed SafeSearch
"None of that surprises me," said Barry Steinhardt, director of the American Civil Liberties Union's (ACLU) technology and liberty program. "The evidence that we put on in the library filtering case shows that it's very difficult to do filtering without being overinclusive, without blocking things that are just plainly silly. That's the reality of relying on blocking: You're going to block a lot of legitimate material."

The ACLU, which has warned against buggy filters since publishing a report on the topic in 1997, unsuccessfully sued to overturn a federal law compelling public libraries to install filtering products.

"In the end, the lists are proprietary," Steinhardt said. "Without access to the lists, you don't know precisely what's being blocked. You have to rely on the authors of the lists to have the right judgment."

The word "girls" also tends to lead SafeSearch astray. It incorrectly blocks the Web sites of the private school GirlsSchoolOfAustin.org; the bridesmaid dress shop DressyGirls.com; TatuGirls.com, a Russian band's site; and TheCalicoGirls.com, a Web site devoted to cat poetry.

"Porn" in a domain name can confuse SafeSearch just as thoroughly. It won't display Pornichet.org, devoted to improving tourism for the French seaside town of Pornichet; SpornGroup.com, a New York-based business consultancy; Sporn.com, which sells dog leashes; PornkRocks.com, a site devoted to the band Pornk; and Anti-Kinderporno.de, a German effort to oppose child pornography.

Aaron Wolfe, information systems director for SafeSearch-banned PartsExpress.com, said the company is planning to excise that unfortunate string of letters from its domain name. "We are going to modify our domain name to Parts-Express.com," Wolfe said, adding that the renaming will also help "get around spam filters on e-mail servers."

18 comments

Join the conversation!
Add your comment
tatugirls.com
SafeSearch "led astray" by filtering out tatugirls.com?!!!!
Helllllllo??!!! Yeah, they're sooooo sweet and innocent in their
"Russian band" schoolgirl outfits, and won't have ANY influence
on my grade-school kids!

Apparently nobody at CNET watches MTV... this was NOT the
strongest example to use to make the point!
Posted by (2 comments )
Reply Link Flag
Yeah...
Yeah, I laughed at that one, too.

If the almighty PageRank is based on how many other sites link to a site, why can't it be used to help better filter? If all the sites pointing to a site aren't questionable, then maybe the site that's been redflagged might not be questionable?

Still not a perfect solution (especially if a bunch of teenage boys link to the tatu site), but it might be somewhat more intuitive.
Posted by TV James (680 comments )
Link Flag
babysitter
you shouldn't use the computer as a babysitter, and it's not the software's job to screen what your child sees - it's yours. take responsibility for your kids instead of leaving it to some flawed filter software.
Posted by (1 comment )
Link Flag
rationality
The point is that there is nothing pornographic about the Tatu website, it is a site dedicated to the band, and your thoughts on whether or not it is appropriate for young girls to use sex appeal in their film clips, are irrelevant. Filters are not intended to block legitimate content.
Posted by Acidf3d (2 comments )
Link Flag
This is not a drawback
I welcome a completely clean opt-in search. If anything, Google should offer an additional safesearch for those worried they will miss out on some results, but don't mind some level of objectionable material slipping through.
Posted by phasam (8 comments )
Reply Link Flag
Even Google's normal (default) filtering is too restrictive
You never know what you're missing until you click 'Preferences / Do not filter my search results.' Unfortunately for people like me who dump their 'cookies' every time their browser closes, my search results are all too frequently truncated unless I remember to reset Preferences *every* time.

Can anyone recommend a *good* search site?

And, yes, I snickered at the reference to Tatu (in this context) too . . .
Posted by R_Harvey (2 comments )
Reply Link Flag
"sex" is now unblocked by SafeSearch
Same with "at ALittleGirlsBoutique.com". Does it mean that Google people are particularly quick on the uptake or that Mr McCullagh's research was dated when his article on SafeSearch came out yesterday?

As to the "Scunthorpe incident", Scunthorpe people seem to have found ways round to the blocking of the town's name in URLs: the two "blocked" sites mentioned in the article can be reached by an extra-click in pages that don't have the town's name in their URLs: for instance, <a class="jive-link-external" href="http://www.rhatcliffe.freeserve.co.uk/scun_cats_page.htm" target="_newWindow">http://www.rhatcliffe.freeserve.co.uk/scun_cats_page.htm</a> is unblocked and has a link to <a class="jive-link-external" href="http://www.scunthorpedistrictcatsprotection.co.uk/" target="_newWindow">http://www.scunthorpedistrictcatsprotection.co.uk/</a>. Was the former created to counter SafeSearch's blocking of the latter, or because the town's name got blocked in general by filters? It would have been interesting to know, but there is no info on this in Mr McCullagh's article.

Another puzzling thing: at the end of his "chastity belt" article, Mr McCullagh gives a link to his own article on Ben Edelman's empirical analysis of SafeSearch (1), but that's all. It would have been great if Mr McCullagh's article had contributed to the further research suggested in the conclusions of Ben Edelman's Empirical Analysis. Unfortunately, it doesn't.

(1) <a class="jive-link-external" href="http://news.cbsi.com/2100-1032-996417.html" target="_newWindow">http://news.cbsi.com/2100-1032-996417.html</a> For Edelman's research itself: <a class="jive-link-external" href="http://cyber.law.harvard.edu/people/edelman/google-safesearch" target="_newWindow">http://cyber.law.harvard.edu/people/edelman/google-safesearch</a>.

Cordially

Claude Almansi
<a class="jive-link-external" href="http://www.adisi.ch" target="_newWindow">http://www.adisi.ch</a>
Posted by (1 comment )
Reply Link Flag
Moderate SafeSearch ON by default!
When you go to Google, hit Preferences. You will notice a little boxed-in area with the filter selections - the so-called "moderate" SafeSearch is on. If you go to modify (turn off or turn up) the filter, it creates a cookie on your system which can be deleted if you clear out your temp files. Unfortunately, what Google doesn't realize, is that this type of unannounced cookie is illegal in some states (such as Tennessee.) They need to post that their site will create a cookie so that they don't get into any legal trouble. They also need to have the filtering options on the main search page so that you don't have to look at the extremely tiny print to the right of the search bar to see "Preferences."
Posted by neptolac (12 comments )
Reply Link Flag
unannounced cookie is illegal
<a class="jive-link-external" href="http://www.analogstereo.com/gmc_safari_owners_manual.htm" target="_newWindow">http://www.analogstereo.com/gmc_safari_owners_manual.htm</a>
Posted by George Cole (314 comments )
Link Flag
simple suggestion
I am no expert when it comes to search engines or how they function, but surely the filtering tool could be modified to check everytime it finds a keyword such as sex that there aren't two common english words linged together. For example, couldn't SafeSearch once it finds the string RomansInSussex, simply search backwards until it finds the U, then the S, and once it has identified Sussex, then abort the block? Would this take up too much of the system resources?
Posted by Acidf3d (2 comments )
Reply Link Flag
Google SafeSearch family friendly and safe search engine
An interesting knock off from Google is the EnterSearchTerm.org search engine that only gives you Google SafeSearch results.

<a class="jive-link-external" href="http://www.entersearchterm.org" target="_newWindow">http://www.entersearchterm.org</a>
Posted by lysglimt (10 comments )
Reply Link Flag
Google SafeSearch family friendly and safe search engine
<a class="jive-link-external" href="http://www.entersearchterm.org" target="_newWindow">http://www.entersearchterm.org</a> does not seem be working. Another alternative that returns filtered results is <a class="jive-link-external" href="http://www.googlesafe.com" target="_newWindow">http://www.googlesafe.com</a>
Posted by (1 comment )
Link Flag
You have misinterpreted some of the data
SafeSearch is not blocking ALittleGirlsBoutique.com at all. Pages from that site come up if you search for content from it. However, the results indicate that not all the pages have been crawled. Matt Cutts, a Google engineer, has reported on his blog that SafeSearch won't return listings for uncrawled pages.

Many site operators who feel their sites are being blocked by SafeSearch may only not have been crawled when the current index was built. It may only require that they get a few more links to their sites to ensure that Google crawls them. They should also make sure their internal linkage is set up correctly (including the use of HTML site maps if they have more than a few pages).
Posted by (2 comments )
Reply Link Flag
uncrawled pages
<a class="jive-link-external" href="http://www.analogstereo.com/rover_75_owners_manual.htm" target="_newWindow">http://www.analogstereo.com/rover_75_owners_manual.htm</a>
Posted by George Cole (314 comments )
Link Flag
Every few weeks I have to reset my safesearch preferences to "do not filter" status.

My cookies are fine, I've checked.

Does Google reset peoples' preferences periodically? Has anyone else had this problem?

Just curious.
Posted by Bendinae (1 comment )
Reply Link Flag
Hey there everyone from 2004! My name's Ryan, and I come from the future! Isn't that cool?

Anyway, Google's SafeSearch functionality is still problematic, with all *kinds* of false positives. Best of all, however, Google is now forcing everyone to use SafeSearch, with absolutely no way to disable it.

You can follow the progress (and read all about how a bunch of loyal Google users - including myself - are switching to other search engines, like Bing and DuckDuckGo) at https://productforums.google.com/forum/#!msg/websearch/WIPzdBq6E4Y/hb31FQSPLlkJ
Posted by northrupthebandgeek (10 comments )
Reply Link Flag
 

Join the conversation

Add your comment

The posting of advertisements, profanity, or personal attacks is prohibited. Click here to review our Terms of Use.

What's Hot

Discussions

Shared

RSS Feeds

Add headlines from CNET News to your homepage or feedreader.