• On CHOW: Getting sloshed with the boss
May 4, 2009 12:31 PM PDT

Patent reveals Google's book-scanning advantage

by Stephen Shankland
  • Font size
  • Print
  • 14 comments

Sometimes overlooked in the Sturm und Drang about Google Book Search is any consideration of the mechanics of economically scanning the books in the first place, but a patent awarded to Google gives insight into how the search behemoth accomplishes the task.

In short, Google has come up with a system that uses two cameras and infrared light to automatically correct for the curvature of pages in a book. By constructing a 3D model of each page and then "de-warping" it afterward, Google can present flat-looking pages online without having to slice books up or mash them onto a flatbed scanner.

This diagram shows patented Google technology for correcting for curved pages while scanning books.

This diagram shows patented Google technology for correcting for curved pages while scanning books.

(Credit: Google)

The sophistication of the technology illustrates that would-be competitors who want to feature their own digitized libraries won't have a trivial time catching up to Google, which already has scanned more than 7 million books. Any unskilled laborer can plop a book on an ordinary scanner and run some optical character recognition (OCR) operations that convert the imagery into textual data, but doing so rapidly and with high-quality images is another matter.

Here's how the Google system is described in Patent 7,508,978:

This pattern can be shown on the book with infrared light; infrared cameras photograph it to deduce the 3D shape of the pages.

This pattern can be shown on the book with infrared light; infrared cameras photograph it to deduce the 3D shape of the pages.

(Credit: Google)

First, the book is placed on a flat surface. Above it, an infrared projector displays a special mazelike pattern onto the pages.

Next, two infrared cameras photograph the infrared pattern from different perspectives.

"The images can be stereoscopically combined, using known stereoscopic techniques, to obtain a three-dimensional mapping of the pattern," according to the patent. "The pattern falls on the surface of (the) book, causing the three-dimensional mapping of the pattern to correspond to the three-dimensional surface of the page of the book."

Next, photos of the page taken with conventional cameras can be de-warped, permitting easier OCR and a better image when showing the real book in conjunction with search results based on the text.

Via National Public Radio

Stephen Shankland writes about a wide range of technology and products, but has a particular focus on browsers and digital photography. He joined CNET News in 1998 and since then also has covered Google, Yahoo, servers, supercomputing, Linux and open-source software, and science. E-mail Stephen, or follow him on Twitter at http://www.twitter.com/stshank.
Recent posts from Cutting Edge
NASA's Kepler finds five 'hot Jupiters'
NASA's next frontier: Venus, the moon, or an asteroid
Soyuz craft docks, boosts space station crew
Three station fliers set off on flight to lab complex
Undersea robot captures rare deep-sea eruption
Japanese robot helps out with grocery shopping
Predator drones hacked in Iraq operations
A trip to the Boeing 787 Dreamliner Gallery
Add a Comment (Log in or register) (14 Comments)
  • prev
  • 1
  • next
by hansschmucker2 May 4, 2009 1:27 PM PDT
As much as I can see the value in this system, I still have to wonder: What's the invention here? I mean, everything that's used here is covered either by other patents or common knowledge, except well, that you can use it to flatten pages of a book. That's it. Is that really something that's worth a patent?
Reply to this comment
by rapier1 May 4, 2009 1:33 PM PDT
Combining known techniques in novel ways to address a demonstrable need is indeed patentable.
by renopanther May 4, 2009 7:26 PM PDT
The better mouse trap always 'seems' simple after you see it. This is a smart way to scan pages without damaging the books or re-inventing the wheel.
by marap May 4, 2009 1:45 PM PDT
Innovation my dear .. innovation.
Reply to this comment
by tonari-no-totoro May 4, 2009 3:19 PM PDT
Read Vernor Vinge's Rainbows End for an even more clever way to digitize an entire library in a few days.

Ironically, you can read what Vinge wrote on Goolge Book Search, since his book has already been digitized by Google.

http://books.google.com/books?id=SrLwPdBJodMC&dq=rainbows+end&printsec=frontcover&source=bn&hl=en&ei=B2n_SfONK4WItAPZ3Z3tBQ&sa=X&oi=book_result&ct=result&resnum=4#PPA123,M1
Reply to this comment
by spoonie1972 May 4, 2009 3:40 PM PDT
see: Intermittent Wipers.

Interesting way to tackle a problem. Touché.
Reply to this comment
by t8 May 4, 2009 4:01 PM PDT
I claim the idea for a scanner that can look at all the pages without opening the book by Xray or similar spectral wave. Then the pages will be flat and you can scan the whole book in one go. This is similar to how archeologists map/scan underground systems or how a you can input data onto a multilayer DVD and read the different layers. Question for the lawyers: Is this post considered prior art?
Reply to this comment
by mikeburek May 5, 2009 11:44 PM PDT
There actually is a scanner that can do such a feat. It can read through material, and can calibrate the depth it scans measured in atoms, or something small, so it can move through a book at 1/2 page width increments and read just the ink atoms.
by skelem May 4, 2009 5:07 PM PDT
You can't claim an idea. You have to have a method for accomplishing your idea, expressed in enough detail that someone versed in the art could build a device for accomplishing said method.
Reply to this comment
by t8 May 4, 2009 5:45 PM PDT
Thx
by May 4, 2009 5:58 PM PDT
So why not just slice off the binding and throw the book in an automatic sheet feeder on a scanner? There's usually enough margin to allow for a small portion of the edge of the page to be cut. Problem with above method is you still need to turn the pages.
Reply to this comment
by renopanther May 4, 2009 7:28 PM PDT
What if it's the Gutenberg bible?
You cant just trash an entire library. Rebinding the library would be super expensive too.
by rayzoredge May 5, 2009 6:57 AM PDT
The way that they do it IS pretty darn quick. It doesn't beat an auto paper feeder, but when you hire a bunch of people to flip a page a second, you can get through a lot of literature without damaging the book.
Reply to this comment
by paulej May 6, 2009 10:35 AM PDT
Who flips the pages?
Reply to this comment
(14 Comments)
  • prev
  • 1
  • next
advertisement

Five New Year's resolutions for Google

Stakes are high as Google attempts to maintain one of the Internet's greatest cash machines while pushing into new and risky markets.
• Android event set for Jan. 5

For eBay sellers, a holiday hamster hangover

The gift frenzy over Zhu Zhu Pets leaves some power sellers feeling like they've just run a marathon--but the steep price tags lead to some impressive profits.

About Cutting Edge

Keep up-to-date on cutting-edge research and what's new in a wide range of areas from robotics, space ventures and general science to automobile design and solar energy.

Add this feed to your online news reader

Cutting Edge topics

advertisement
advertisement

Inside CNET News

Scroll Left Scroll Right