Patent reveals Google's book-scanning advantage
Sometimes overlooked in the Sturm und Drang about Google Book Search is any consideration of the mechanics of economically scanning the books in the first place, but a patent awarded to Google gives insight into how the search behemoth accomplishes the task.
In short, Google has come up with a system that uses two cameras and infrared light to automatically correct for the curvature of pages in a book. By constructing a 3D model of each page and then "de-warping" it afterward, Google can present flat-looking pages online without having to slice books up or mash them onto a flatbed scanner.
This diagram shows patented Google technology for correcting for curved pages while scanning books.
(Credit: Google)The sophistication of the technology illustrates that would-be competitors who want to feature their own digitized libraries won't have a trivial time catching up to Google, which already has scanned more than 7 million books. Any unskilled laborer can plop a book on an ordinary scanner and run some optical character recognition (OCR) operations that convert the imagery into textual data, but doing so rapidly and with high-quality images is another matter.
Here's how the Google system is described in Patent 7,508,978:
This pattern can be shown on the book with infrared light; infrared cameras photograph it to deduce the 3D shape of the pages.
(Credit: Google)First, the book is placed on a flat surface. Above it, an infrared projector displays a special mazelike pattern onto the pages.
Next, two infrared cameras photograph the infrared pattern from different perspectives.
"The images can be stereoscopically combined, using known stereoscopic techniques, to obtain a three-dimensional mapping of the pattern," according to the patent. "The pattern falls on the surface of (the) book, causing the three-dimensional mapping of the pattern to correspond to the three-dimensional surface of the page of the book."
Next, photos of the page taken with conventional cameras can be de-warped, permitting easier OCR and a better image when showing the real book in conjunction with search results based on the text.
Stephen Shankland writes about a wide range of technology and products, but has a particular focus on browsers and digital photography. He joined CNET News in 1998 and since then also has covered Google, Yahoo, servers, supercomputing, Linux and open-source software, and science. E-mail Stephen, or follow him on Twitter at http://www.twitter.com/stshank. 





Ironically, you can read what Vinge wrote on Goolge Book Search, since his book has already been digitized by Google.
http://books.google.com/books?id=SrLwPdBJodMC&dq=rainbows+end&printsec=frontcover&source=bn&hl=en&ei=B2n_SfONK4WItAPZ3Z3tBQ&sa=X&oi=book_result&ct=result&resnum=4#PPA123,M1
Interesting way to tackle a problem. Touché.
You cant just trash an entire library. Rebinding the library would be super expensive too.
- by paulej May 6, 2009 10:35 AM PDT
- Who flips the pages?
- Like this Reply to this comment
-
(14 Comments)