• On CBSSports.com: Mike Tyson's daughter dies in accident
April 14, 2008 9:16 AM PDT

Google dips toes into 'deep Web' search

by Stephen Shankland

Google's ever-active search bots, which scour the Web constantly for new pages, have begun a new, more active phase of their indexing jobs.

In a blog post Friday, Jayant Madhavan and Alon Halevy of Google's crawling and indexing team said the company has begun an experiment in which its indexing software experimentally enters text in Web site forms to see what previously undiscovered pages may appear.

"In the past few months, we have been exploring some HTML forms to try to discover new Web pages and URLs that we otherwise couldn't find and index for users who search on Google," they wrote. "This experiment is part of Google's broader effort to increase its coverage of the Web. In fact, HTML forms have long been thought to be the gateway to large volumes of data beyond the normal scope of search engines."

The new Google indexing practice involves only "high quality" Web sites and doesn't run on sites with "robots.txt" files or other standard mechanisms of warding off indexing software.

To decide what words to "type" into the forms, the indexing software samples from among words on the Web page with the form, Google said.

The technology looks related to a company called Transformic that Google acquired, according to a blog post by Anand Rajaraman, who was involved with the technology earlier in his career, while working for Halevy.

Originally posted at News Blog
Stephen Shankland writes about a wide range of technology and products, but has a particular focus on browsers and digital photography. He joined CNET News in 1998 and since then also has covered Google, Yahoo, servers, supercomputing, Linux and open-source software, and science. E-mail Stephen, or follow him on Twitter at http://www.twitter.com/stshank.
Recent posts from Webware
4chan may be behind attack on Twitter
Firefox 3.5 and the potential of Web typography
Sites that help you lodge complaints
Google App Engine misfires
Microsoft: Bing needs to improve when news breaks
Google finally sued by makers of Finally Fast
Google Toolbar for IE speaks your language
Bing brings out the tweets
Add a Comment (Log in or register) (6 Comments)
  • prev
  • 1
  • next
Another tool for recruiters and headhunters?
by SteveCherry April 14, 2008 10:13 AM PDT
Deep web is nothing new, but it would be a new capability to an already dominant force in the research toolbox. Google's interface has been the tool of choice for recruiters and headhunters for years now, but an added deep web capability could really change the game.

Yahoo has never even come close to the older capabilities of Google, so this would ensure a healthy lead for some time.

Steve Cherry
www.sandiegoemployer.com
Reply to this comment
by zumpa1 February 16, 2009 7:10 PM PST
I wonder if it would post on forums and stuff. If so I think this site may be a test run of google talking to itself: http://www.f2bb.com
I noticed that
by t8 April 14, 2008 3:01 PM PDT
Google had some URLs for my website that were search strings.

I am happy about that too.
Reply to this comment
Interesting--were pages ranked high?
by Shankland April 14, 2008 9:33 PM PDT
I'm guessing you noticed the string-based results via Google search. Did your pages show up high in the results?
Economic Stimulus Package For SEOs
by pbarnhart01 April 15, 2008 5:02 AM PDT
We looked at this on the 13th in "172 and one-half ways Google Form Spidering Is Kewl" - this will be a boon for SEO/SEM consultants seeking new ways to game it all: http://pbarnhart.wordpress.com/2008/04/13/172-and-one-half-ways-google-form-spidering-is-kewl/
Reply to this comment
Sketchy...
by 47project April 15, 2008 9:53 AM PDT
Just think about how many HTML forms do not check their referrer or protect their data effectively for injection by spammers/bots. It's kinda scary to think about how Google can potentially get into data that may have not been intended for the public eye but was accessed by them because of it's insecure implementation.

Make sure you have web developers that know what they're doing! Eeeek.
Reply to this comment
(6 Comments)
  • prev
  • 1
  • next
advertisement

About Webware

Say No to boxed software! The future of applications is online delivery and access. Software is passé. Webware is the new way to get things done.

Add this feed to your online news reader

Webware topics

Making sense of Windows 7 upgrades

faq The basics and the fine print on Microsoft's options for those eyeing the next operating system from Redmond.
• Full Windows 7 coverage

Road Trip 2009: Big Sky Country

CNET News reporter Daniel Terdiman takes his car full of gadgets to the Rockies and the Great Plains in search of tech, science, nature, and more.
• America's Fortress: Cheyenne Mountain

advertisement

Inside CNET News

Scroll Left Scroll Right