News Blog

Read all 'HTML' posts in News Blog
April 14, 2008 9:16 AM PDT

Google dips toes into 'deep Web' search

by Stephen Shankland
  • 6 comments

Google's ever-active search bots, which scour the Web constantly for new pages, have begun a new, more active phase of their indexing jobs.

In a blog post Friday, Jayant Madhavan and Alon Halevy of Google's crawling and indexing team said the company has begun an experiment in which its indexing software experimentally enters text in Web site forms to see what previously undiscovered pages may appear.

"In the past few months, we have been exploring some HTML forms to try to discover new Web pages and URLs that we otherwise couldn't find and index for users who search on Google," they wrote. "This experiment is part of Google's broader effort to increase its coverage of the Web. In fact, HTML forms have long been thought to be the gateway to large volumes of data beyond the normal scope of search engines."

The new Google indexing practice involves only "high quality" Web sites and doesn't run on sites with "robots.txt" files or other standard mechanisms of warding off indexing software.

To decide what words to "type" into the forms, the indexing software samples from among words on the Web page with the form, Google said.

The technology looks related to a company called Transformic that Google acquired, according to a blog post by Anand Rajaraman, who was involved with the technology earlier in his career, while working for Halevy.

November 12, 2007 7:28 AM PST

Microsoft IE patch eliminates extra step

by Candace Lombardi
  • Post a comment

The "click to activate" step for using certain interactive Web pages with embedded controls will no longer be required when viewing them with Internet Explorer, Microsoft announced Monday.

Microsoft had kept a "click to activate" requirement for interactive Web pages that embedded controls via HTML, in order to avoid patent infringement.

Microsoft has now licensed the technology from Eolas that allows that interaction to happen automatically. Eolas had been engaged in a long-running patent dispute with Microsoft that resulted in a settlement in August.

The result of that agreement is that IE users will no longer be bothered by that extra step. The change will be included in the Internet Explorer Automatic Component Activation Preview patch available from the Microsoft download site in early December. It will then be included for all IE users when the full IE Cumulative Update goes out in April 2008.

The update will not affect the way pages work, nor will developers and designers need to make any adjustment to the way they build their pages, Pete LePage, senior product manager for Microsoft Internet Explorer, said in a statement.

Those who have been using work-arounds using WebOC or MSHTML to bypass the "click to activate" step automatically on their own may have to make some adjustments. More info on that can be found on the IE blog.

  • prev
  • 1
  • next
advertisement

15 sites that went kaput in 2009

Web sites launch all the time, but they also shut their doors. We highlight 15 that bit the dust this year.

Top 10 news stories of the decade

Let the debate begin: Was the iPhone more important than iTunes? Was anything bigger than Google finding a great business model? CNET offers its list of the 10 most important stories of the '00s.

About News Blog

Recent posts on technology, trends, and more.

Add this feed to your online news reader



advertisement

Inside CNET News

Scroll Left Scroll Right