Executives at the Authors Guild say the text-to-speech feature in Amazon's Kindle 2 could hurt sales of audio books. Not all of the experts agree, including the guild's.
Andy Aaron, an expert on text-to-speech technology, recently commented in an interview about how much such systems have advanced. In an op-ed piece published Tuesday in the The New York Times titled "The Kindle Swindle?" Roy Blount Jr., president of the Authors Guild, used Aaron's quotes to support his argument that the Kindle's voice feature could threaten the future of audio books.
But when asked to elaborate, Aaron told CNET News on Wednesday that the audio-book market has little to fear from "synthetic voices."
"I'm a big believer in (text-to-speech) and a booster of it," said Aaron, who is with IBM's Watson Research Center. "But I don't think at this point, or for the foreseeable future, it's going to compete meaningfully with a professional book reader...Am I going to sit down and put my feet up and listen to text-to-speech read 'War And Peace' or Harry Potter for six to eight hours? For someone who has the choice, I think they would rather get an audio book."
Amazon appears headed towards a showdown with the Authors Guild over text-to-speech technology. This enables computers to read text in a lifelike voice. Paul Aiken, executive director of the Authors Guild, a trade group representing 9,000 authors, argues that Amazon isn't compensating authors for Kindle's text-to-speech feature. He claims authors' copyrights are being violated.
Amazon representatives did not respond to a request for comment.
Aiken generated a lot of attention when he first raised concerns about the Kindle following the debut earlier this month of the e-book reader. On Wednesday, Aiken said Amazon never informed the guild--or book publishers for that matter--of the retailer's plan to include the feature.
In the weeks since the Kindle debut, the guild has had discussions with Amazon and the online retailer is taking a "hard-line position," Aiken said. All this doesn't bode well for finding an amicable resolution.
Aiken wouldn't say what the guild's plans are but confirmed that guild administrators won't rule out filing a lawsuit.
"Anytime you have a new means of accessing content," said Aiken, "there's always some sort of aggregator that wants to control it and keep the value for themselves."
As for Aaron's assertions that text-to-speech systems won't threaten audio books for a long time, Aiken says nobody knows the future.
"Things move quickly," Aiken said. "I think the technology has made a generational leap in just the last few years."
To prove the point, the guild has posted demonstrations of text-to-speech technologies offered by Apple four years ago (the video posted above). The voice is monotone and unintelligible in places. It sounds like it was lifted from a bad sci-fi film.
The next clip is a recording of Kindle's text-to-speech offering. (At right, I've included a humorous demonstration of Kindle text-to-speech function posted to YouTube by a user called Kindlejunkie). The differences are sharp. The Kindle's voice pronounces words clearly and sounds far more lifelike. There is however, no inflection or emphasis. The thing drones on.
It's not that the technology can't create dramatic effects. Aaron says the technology has advanced to a point where synthetic voices can be made to sound happy or apologetic. The major roadblock for these systems, however, is that they don't know when to insert these effects or choose the effect that is most appropriate.
What's missing in computers is the ability to understand what they're reading, said Aaron.
"Even a mediocre human reader is interacting with the text and understands every word that he or she is reading," Aaron said. "Text-to-speech doesn't. It can be really good. It can be really smooth. It can sound very lifelike. But it doesn't understand what it's reading. Do you want to listen to a reader that doesn't understand what they're reading?"
The obvious question here is if text-to-speech systems can read something with a specific emotional tone, couldn't a publisher go into a digital book and mark where they want to insert a specific effect?
They could, says Aaron, but that would take an enormous of amount of time and expense. At that point it's easier to hire a human reader and create an audio book.
Here's a little bit about how they create a voice for text-to-speech. First, a professional reader is hired to read text created for its "phonemic diversity." The sentences are designed to cover a wide range of word sounds. The process takes more than 60 hours to complete, Aaron said.
Algorithms are used to help figure out how to manipulate the sounds correctly.
Aiken concedes that text-to-speech systems can't provide many of the dramatic effects that a human can. But he does think they're good enough to erode sales of audio books.
One thing to remember is that the potential to compete with audio books is only one part of the guild's complaint. Aiken argues that Kindle's voice feature should be considered a separate derivative and authors should share in its revenues.
What's for certain is guild managers don't believe Amazon should give text-to-speech away for free just to help market Kindles.
"This should be considered a legitimate new market for publishers and authors," Aiken said. "It's a technology that should be used for incremental revenue. With all the squeezing that's going on in publishing, you just can't let this one go."