IBM voice ace: Kindle no threat to audio books
Executives at the Authors Guild say the text-to-speech feature in Amazon's Kindle 2 could hurt sales of audio books. Not all of the experts agree, including the guild's.
Andy Aaron, an IBM text-to-speech expert, says synthetic voices don't know when to add emphasis or inflection when reading.
(Credit: Andy Aaron)Andy Aaron, an expert on text-to-speech technology, recently commented in an interview about how much such systems have advanced. In an op-ed piece published Tuesday in the The New York Times titled "The Kindle Swindle?" Roy Blount Jr., president of the Authors Guild, used Aaron's quotes to support his argument that the Kindle's voice feature could threaten the future of audio books.
But when asked to elaborate, Aaron told CNET News on Wednesday that the audio-book market has little to fear from "synthetic voices."
"I'm a big believer in (text-to-speech) and a booster of it," said Aaron, who is with IBM's Watson Research Center. "But I don't think at this point, or for the foreseeable future, it's going to compete meaningfully with a professional book reader...Am I going to sit down and put my feet up and listen to text-to-speech read 'War And Peace' or Harry Potter for six to eight hours? For someone who has the choice, I think they would rather get an audio book."
Amazon appears headed towards a showdown with the Authors Guild over text-to-speech technology. This enables computers to read text in a lifelike voice. Paul Aiken, executive director of the Authors Guild, a trade group representing 9,000 authors, argues that Amazon isn't compensating authors for Kindle's text-to-speech feature. He claims authors' copyrights are being violated.
Amazon representatives did not respond to a request for comment.
Aiken generated a lot of attention when he first raised concerns about the Kindle following the debut earlier this month of the e-book reader. On Wednesday, Aiken said Amazon never informed the guild--or book publishers for that matter--of the retailer's plan to include the feature.
In the weeks since the Kindle debut, the guild has had discussions with Amazon and the online retailer is taking a "hard-line position," Aiken said. All this doesn't bode well for finding an amicable resolution.
Aiken wouldn't say what the guild's plans are but confirmed that guild administrators won't rule out filing a lawsuit.
"Anytime you have a new means of accessing content," said Aiken, "there's always some sort of aggregator that wants to control it and keep the value for themselves."
As for Aaron's assertions that text-to-speech systems won't threaten audio books for a long time, Aiken says nobody knows the future.
"Things move quickly," Aiken said. "I think the technology has made a generational leap in just the last few years."
To prove the point, the guild has posted demonstrations of text-to-speech technologies offered by Apple four years ago (the video posted above). The voice is monotone and unintelligible in places. It sounds like it was lifted from a bad sci-fi film.
The next clip is a recording of Kindle's text-to-speech offering. (At right, I've included a humorous demonstration of Kindle text-to-speech function posted to YouTube by a user called Kindlejunkie). The differences are sharp. The Kindle's voice pronounces words clearly and sounds far more lifelike. There is however, no inflection or emphasis. The thing drones on.
It's not that the technology can't create dramatic effects. Aaron says the technology has advanced to a point where synthetic voices can be made to sound happy or apologetic. The major roadblock for these systems, however, is that they don't know when to insert these effects or choose the effect that is most appropriate.
What's missing in computers is the ability to understand what they're reading, said Aaron.
"Even a mediocre human reader is interacting with the text and understands every word that he or she is reading," Aaron said. "Text-to-speech doesn't. It can be really good. It can be really smooth. It can sound very lifelike. But it doesn't understand what it's reading. Do you want to listen to a reader that doesn't understand what they're reading?"
The obvious question here is if text-to-speech systems can read something with a specific emotional tone, couldn't a publisher go into a digital book and mark where they want to insert a specific effect?
They could, says Aaron, but that would take an enormous of amount of time and expense. At that point it's easier to hire a human reader and create an audio book.
Here's a little bit about how they create a voice for text-to-speech. First, a professional reader is hired to read text created for its "phonemic diversity." The sentences are designed to cover a wide range of word sounds. The process takes more than 60 hours to complete, Aaron said.
Algorithms are used to help figure out how to manipulate the sounds correctly.
Aiken concedes that text-to-speech systems can't provide many of the dramatic effects that a human can. But he does think they're good enough to erode sales of audio books.
One thing to remember is that the potential to compete with audio books is only one part of the guild's complaint. Aiken argues that Kindle's voice feature should be considered a separate derivative and authors should share in its revenues.
What's for certain is guild managers don't believe Amazon should give text-to-speech away for free just to help market Kindles.
"This should be considered a legitimate new market for publishers and authors," Aiken said. "It's a technology that should be used for incremental revenue. With all the squeezing that's going on in publishing, you just can't let this one go."
Greg Sandoval covers media and digital entertainment for CNET News. He is a former reporter for The Washington Post and the Los Angeles Times. E-mail Greg, or follow him on Twitter at http://twitter.com/sandoCNET. 






I think the big difference is that Amazon is selling the books and advertising that a benefit of buying the book from the Kindle, as opposed to other sources, is that you get an audio book experience as well a text experience. Remember these authors aren't selling plain text version of books that can be read by any computer. Amazon promised to protect the text version with their DRM technology. However Amazon, without the author's consent, is breaking their own protection to give itself an exclusive advantage over all other sellers of the material. I think the authors feel that the only one who should be able to assign such exclusive rights, is the original copyholder. And remember, Amazon has contracts, which they have a special duty to execute in good faith, with many of these same authors to sell audio versions of their works.
Also, I don't know if you read the article above, but it is NOT an audio book experience. There is no inflection, different voices for different characters in the story, no emotion. It's a flat reading of text.
Additionally, you are ignoring fair use (applicable in the US only):
From http://w2.eff.org/IP/eff_fair_use_faq.php:
Space-shifting or format-shifting - that is, taking content you own in one format and putting it into another format, for personal, non-commercial use. For instance, "ripping" an audio CD (that is, making an MP3-format version of an audio CD that you already own) is considered fair use by many lawyers, based on the 1984 Betamax decision and the 1999 Rio MP3 player decision (RIAA v. Diamond Multimedia, 180 F. 3d 1072, 1079, 9th Circ. 1999.)
So, if taking an digital raw audio track from a CD and converting it to a digital file is legit, so is taking a digital file and converting it to raw audio, provided it is for your exclusive personal use. You are not permitted to redistribute it. Copyright only protects a work against reproduction for distribution, not for personal use.
Yes. And that would be you, Mr. Aiken, to the detriment of the consumer who has paid for that content.
--mark d.
- by radamo7 February 27, 2009 10:22 AM PST
- Why should there be differing royalty schemes for audio vs. print of the same material... especially when all you are getting with an ebook is the book... To me this is exactly the same is my computer using voice software to read me my ebook material. I really think they are reaching on this one.
- Like this Reply to this comment
-
(11 Comments)RA