• On TV.com: TOP 10 Shows CANCELED Too Soon
February 26, 2009 4:00 AM PST

IBM voice ace: Kindle no threat to audio books

by Greg Sandoval
  • Font size
  • Print
  • 11 comments

Executives at the Authors Guild say the text-to-speech feature in Amazon's Kindle 2 could hurt sales of audio books. Not all of the experts agree, including the guild's.

Andy Aaron, an IBM text-to-speech expert, says synthetic voices don't know when to add emphasis or inflection when reading.

(Credit: Andy Aaron)

Andy Aaron, an expert on text-to-speech technology, recently commented in an interview about how much such systems have advanced. In an op-ed piece published Tuesday in the The New York Times titled "The Kindle Swindle?" Roy Blount Jr., president of the Authors Guild, used Aaron's quotes to support his argument that the Kindle's voice feature could threaten the future of audio books.

But when asked to elaborate, Aaron told CNET News on Wednesday that the audio-book market has little to fear from "synthetic voices."

"I'm a big believer in (text-to-speech) and a booster of it," said Aaron, who is with IBM's Watson Research Center. "But I don't think at this point, or for the foreseeable future, it's going to compete meaningfully with a professional book reader...Am I going to sit down and put my feet up and listen to text-to-speech read 'War And Peace' or Harry Potter for six to eight hours? For someone who has the choice, I think they would rather get an audio book."

Amazon appears headed towards a showdown with the Authors Guild over text-to-speech technology. This enables computers to read text in a lifelike voice. Paul Aiken, executive director of the Authors Guild, a trade group representing 9,000 authors, argues that Amazon isn't compensating authors for Kindle's text-to-speech feature. He claims authors' copyrights are being violated.

Amazon representatives did not respond to a request for comment.

Aiken generated a lot of attention when he first raised concerns about the Kindle following the debut earlier this month of the e-book reader. On Wednesday, Aiken said Amazon never informed the guild--or book publishers for that matter--of the retailer's plan to include the feature.

In the weeks since the Kindle debut, the guild has had discussions with Amazon and the online retailer is taking a "hard-line position," Aiken said. All this doesn't bode well for finding an amicable resolution.

Aiken wouldn't say what the guild's plans are but confirmed that guild administrators won't rule out filing a lawsuit.

"Anytime you have a new means of accessing content," said Aiken, "there's always some sort of aggregator that wants to control it and keep the value for themselves."

As for Aaron's assertions that text-to-speech systems won't threaten audio books for a long time, Aiken says nobody knows the future.

"Things move quickly," Aiken said. "I think the technology has made a generational leap in just the last few years."

To prove the point, the guild has posted demonstrations of text-to-speech technologies offered by Apple four years ago (the video posted above). The voice is monotone and unintelligible in places. It sounds like it was lifted from a bad sci-fi film.

The next clip is a recording of Kindle's text-to-speech offering. (At right, I've included a humorous demonstration of Kindle text-to-speech function posted to YouTube by a user called Kindlejunkie). The differences are sharp. The Kindle's voice pronounces words clearly and sounds far more lifelike. There is however, no inflection or emphasis. The thing drones on.

It's not that the technology can't create dramatic effects. Aaron says the technology has advanced to a point where synthetic voices can be made to sound happy or apologetic. The major roadblock for these systems, however, is that they don't know when to insert these effects or choose the effect that is most appropriate.

What's missing in computers is the ability to understand what they're reading, said Aaron.

"Even a mediocre human reader is interacting with the text and understands every word that he or she is reading," Aaron said. "Text-to-speech doesn't. It can be really good. It can be really smooth. It can sound very lifelike. But it doesn't understand what it's reading. Do you want to listen to a reader that doesn't understand what they're reading?"

The obvious question here is if text-to-speech systems can read something with a specific emotional tone, couldn't a publisher go into a digital book and mark where they want to insert a specific effect?

They could, says Aaron, but that would take an enormous of amount of time and expense. At that point it's easier to hire a human reader and create an audio book.

Here's a little bit about how they create a voice for text-to-speech. First, a professional reader is hired to read text created for its "phonemic diversity." The sentences are designed to cover a wide range of word sounds. The process takes more than 60 hours to complete, Aaron said.

Algorithms are used to help figure out how to manipulate the sounds correctly.

Aiken concedes that text-to-speech systems can't provide many of the dramatic effects that a human can. But he does think they're good enough to erode sales of audio books.

One thing to remember is that the potential to compete with audio books is only one part of the guild's complaint. Aiken argues that Kindle's voice feature should be considered a separate derivative and authors should share in its revenues.

What's for certain is guild managers don't believe Amazon should give text-to-speech away for free just to help market Kindles.

"This should be considered a legitimate new market for publishers and authors," Aiken said. "It's a technology that should be used for incremental revenue. With all the squeezing that's going on in publishing, you just can't let this one go."

Greg Sandoval covers media and digital entertainment for CNET News. He is a former reporter for The Washington Post and the Los Angeles Times. E-mail Greg, or follow him on Twitter at http://twitter.com/sandoCNET.
Recent posts from Digital Media
Wikipedia losing volunteers
'Jurassic Park' kid cast as Facebook co-founder
Farewell, triangles: AOL preps its post-Time Warner look
Report: Microsoft may help News Corp. delist sites
The Black Friday deals that aren't
Has Twitter peaked?
Another (loud, fuzzy) peek at Wired's tablet edition
Can Facebook group change World Cup game result?
Add a Comment (Log in or register) (11 Comments)
  • prev
  • 1
  • next
by puterhead February 26, 2009 4:45 AM PST
I don't understand this whole argument personally. Does the Authors guild receive royalties from every computer manufacturer like dell or HP, every software maker like Microsoft or Apple, every maker of scanners and pens that convert text to speech etc.? My daughter in kindergarten has a "toy" that will take any children?s book she picks up and read it to her as she moves the unit over the text.
Reply to this comment
by essinger February 26, 2009 7:19 AM PST
@puterhead

I think the big difference is that Amazon is selling the books and advertising that a benefit of buying the book from the Kindle, as opposed to other sources, is that you get an audio book experience as well a text experience. Remember these authors aren't selling plain text version of books that can be read by any computer. Amazon promised to protect the text version with their DRM technology. However Amazon, without the author's consent, is breaking their own protection to give itself an exclusive advantage over all other sellers of the material. I think the authors feel that the only one who should be able to assign such exclusive rights, is the original copyholder. And remember, Amazon has contracts, which they have a special duty to execute in good faith, with many of these same authors to sell audio versions of their works.
Reply to this comment
by tauvix February 26, 2009 8:55 AM PST
From what I understand, Amazon's text to speech application will "read" any text loaded to it. You do not have to purchase the books from Amazon to load it to the Kindle.

Also, I don't know if you read the article above, but it is NOT an audio book experience. There is no inflection, different voices for different characters in the story, no emotion. It's a flat reading of text.

Additionally, you are ignoring fair use (applicable in the US only):

From http://w2.eff.org/IP/eff_fair_use_faq.php:

Space-shifting or format-shifting - that is, taking content you own in one format and putting it into another format, for personal, non-commercial use. For instance, "ripping" an audio CD (that is, making an MP3-format version of an audio CD that you already own) is considered fair use by many lawyers, based on the 1984 Betamax decision and the 1999 Rio MP3 player decision (RIAA v. Diamond Multimedia, 180 F. 3d 1072, 1079, 9th Circ. 1999.)

So, if taking an digital raw audio track from a CD and converting it to a digital file is legit, so is taking a digital file and converting it to raw audio, provided it is for your exclusive personal use. You are not permitted to redistribute it. Copyright only protects a work against reproduction for distribution, not for personal use.
by tauvix February 26, 2009 8:57 AM PST
Correction to my previous comment. The last line should read "Copyright only protects a work against reproduction for distribution, not for personal use, provided you already hold a legal license to the work."
by keithrt49 February 26, 2009 7:29 AM PST
Is the Author's Guild implying that if I read a book aloud to a child I'm violating the publisher's copyright? This is a red herring if there ever was one.
Reply to this comment
by McbarCODE February 26, 2009 9:32 AM PST
^^^ My sentiments exactly. ^^^
by dirty55409 February 26, 2009 1:08 PM PST
inflection is obviously important.... The read book would put you to sleep. This would Never.... I repeat Nevahhhh replace audio books. Where a real life person (sometimes even a celebrity) does the reading. They are professionals at what they do and this is a joke with a super high price tag.
Reply to this comment
by markdoiron February 26, 2009 2:49 PM PST
From the article: "'Anytime you have a new means of accessing content,' said Aiken, 'there's always some sort of aggregator that wants to control it and keep the value for themselves.'"

Yes. And that would be you, Mr. Aiken, to the detriment of the consumer who has paid for that content.

--mark d.
Reply to this comment
by Anysia February 26, 2009 5:24 PM PST
I have heard what the Kindle sounds like, and I have audio books. The Kindle, as much improved as it is, is not competition for audio books. Also this, if someone has the e-book Kindle, the odds are pretty good they weren't even planning on getting the audio book version.
Reply to this comment
by kscottdunn February 26, 2009 9:27 PM PST
I have a Kindle 2 and an Audible.com account. The text to speech feature is nice and I might use it now and then (I can see being in a good part of a book but having to drive to work or some other task), but there is no way I would consider listening to it for a long period of time. This definitely won't be causing me to buy any fewer audibooks. I think the author's guild is going to spend much more on legal fees trying to fight this than they would lose in potential audiobook sales (so far I haven't seen anyone say that they would use their Kindle instead of purchasing the audio version).
Reply to this comment
by radamo7 February 27, 2009 10:22 AM PST
Why should there be differing royalty schemes for audio vs. print of the same material... especially when all you are getting with an ebook is the book... To me this is exactly the same is my computer using voice software to read me my ebook material. I really think they are reaching on this one.
RA
Reply to this comment
(11 Comments)
  • prev
  • 1
  • next
advertisement

The 411 on early-termination fees

Verizon Wireless has doubled its early-termination fees for smartphones, but what does it mean for the rest of the industry?

Google has its own plan for Netbooks

No, the search giant isn't saying it will build a Netbook. But it sure knows what it would like one running Chrome OS to resemble, and that's a little different from the Netbook of today.
• Screenshot tour of Chrome OS

About Digital Media

The Web is now the place to go for news and entertainment. Look here for the latest on blogs, music, video, virtual worlds, social networking and more.

Add this feed to your online news reader

Digital Media topics

advertisement
advertisement

Inside CNET News

Scroll Left Scroll Right