Typing on the iPhone/iPod Touch's keyboard can be arduous. This is never more evident than when trying to bang out messages in several instant-messaging conversations at once. Shape Services, the makers of the popular IM+ instant-messaging app ($9.99 App Store link), have realized this, and are soon rolling out a new version of the app that includes speech-to-text, albeit at a price.
Taking advantage of Apple's recently released in-app payment system, 99 cents a month gets you the feature, meaning that the annual cost of continuing to use it is about $12 a year. Not bad if you're a heavy user. But how well does it work?
In short, it does a decent job, but it still experiences some of the typical pitfalls found in other speech-to-text tools. If you've used Google's search app on the iPhone you know all too well that it can handle some words better than others, and that it works slightly faster when you're on Wi-Fi. The same can be said of IM+.
The app managed to get a few sentences without flaws, but I regularly found myself going into make a quick edit to one or two words each time. That wouldn't be so bad if it didn't take so long to do all the processing. Over 3G, small quips like a four- or five-word reply took around 15 seconds to process and get sent back, whereas full messages took up to 24 seconds. These times were cut a few seconds shorter when on a solid Wi-Fi connection, but still on the long side.
The updated version of the app is in Apple's review queue, meaning it could be out later this week, month, or be rejected outright (although not likely since it's using standard APIs). Besides speech-to-text, the update also adds animated emoticons for whatever service you're using. It's a small touch, but sure to make IM enthusiasts happy. We take a quick look at that and the speech-to-text feature in the video below. Worth noting is that processing times have been sped up for the sake of time, although we make note of that when it occurs:
Box.net has added iSpeech to its OpenBox platform, which lets users get integrated text-to-speech conversion on any text document they have stored on the service. Users simply have to add it to their list of used OpenBox services and it becomes a part of Box.net's contextual menus, meaning you only need to right click on the document and choose the text-to-speech option to get it going.
You do have to be separately signed up with iSpeech to get this to work. It's not a free service when it comes to processing full length documents. There is, however a free tier of service that gives you 250 words per conversion, which amounts to a couple of paragraphs. If you feel like converting your doctoral thesis you'll need one of the service's premium plans.
While neat, I think a far more useful add-on to Box, or any other storage provider would be converting audio recordings into text documents. I've recently become hooked on this with my voicemail through Google Voice, and it would be great get a recording from an interview, or business meeting transcribed in the same place I'm storing it in the cloud.
A right click on any document file will let you convert it to speech using iSpeech. You have to be registered with that service to use it though.
(Credit: CNET Networks)Google has elevated the profile of its attempt to make videos searchable through speech recognition technology, a move that portends a potentially more financially successful YouTube division.
The speech recognition technology was used in an online application that let people search political speeches launched in July, and now the Gaudi (Google Audio Indexing) project has an official interface at Google Labs.
Google Audio Indexing (Gaudi) lets people use a text search of some YouTube videos. (Click to enlarge.)
(Credit: Google)The site's search box has instructions: "Search what the politicians are saying." The search results are presented next to a YouTube video player, and clicking each result sets the player to show the part of the video where the words were spoken. It doesn't just show speeches--a search for "bridge to nowhere" also returned the "Real Mavericks" ad from the John McCain-Sarah Palin campaign.
Extracting words from videos could make it easier for Google to determine what content is in the video and therefore what ads are most appropriate to show next to them. Making money from YouTube is a top priority this year.
Speech-to-text conversion also could help Google blend relevant videos into search results. Currently, the best way to understand what's in a video is by examining the accompanying metadata, such as titles and captions, but that's often much narrower than what's spoken.
And with Google's translation work, it's possible that the company could transcribe videos' text into other languages.
Clearly, Google has big ambitions for the audio recognition technology. "The aim of Google Audio Indexing on Google Labs is broader (than that of the and the Google Elections Video Search gadget), and the U.S. election is just a first step. We see it as an experiment platform where we can learn what features make the best user experience for people looking for spoken content on the Web," the company said in a frequently-asked-questions page about the Google Audio Indexing project.
Google is beginning with political information because it's trying to become a prominent part of the democratic process and because political speeches receive a lot of attention, the company said. Also, presumably because politicians generally don't mumble as much as the rest of us, the speech recognition technology performs better, Google said.
(Via Google Operating System.)
If you're already bored of getting English translated to Mandarin through JaJah, TwitterFone, another mobile service with voice recognition savvy, has put out a neat update that's sure to burn through your mobile phone minutes. You can now listen to the last 10 tweets from your Twitter pals and respond to any of them that you'd like using the same speech-to-text system in place for publishing tweets of your own.
It's certainly not as fast or easy to parse voice messages as the mobile version of Twitter (m.twitter.com), but if you're on an older handset and don't have a data plan, this is about as easy as it gets to stay in touch with Twitter without buying new hardware. It's also nice enough to list the full names of Twitter users, not just their user name, which could be a good or bad thing depending on how well you know the people you're following.
One thing that was slightly off for me was the time stamping, with tweets from just a few minutes ago being listed as a full hour behind, at least according to TwitterFone's automated system. I'm assuming this is a kink that will be worked out in the future. Otherwise, if you're a big fan of sitting back and enjoying some blurbs from your friends while on the go (spoken like sweet nothings by a female robot), then TwitterFone is right up your alley.
TwitterFone is still in private beta.
Related: Dial2Do: Speak your Twitters, e-mails, SMS messages and more
Listening to MP3s of robotic voices reading stories from the Web is a good way to prepare for the eventual downfall of mankind at the hands of our robotic overlords. If you're into that kind of thing, Hearwho will do all the heavy lifting for you by converting any text you feed it into a downloadable MP3 file.
If you've spent hours amusing yourself playing with AT&T's text-to-speech demo, you'll be glad to know that Hearwho does away with the somewhat annoying 300-character limit. I dumped an entire 800-word story into the text box and had a playable MP3 file in around 15 minutes.
To speed up the process, you can either use less text or dial down the quality. There are four compression settings to crunch a file down to a proper size. You can also toggle between male and female speakers, and whether you want it spoken in English or Spanish.
One thing to note is that if you're doing this with blog posts, the blog you're grabbing the feed from might already have an audio-RSS feed that you can simply subscribe to. A good directory of these feeds can be found at Stitcher, which also has a really neat iPhone Web app.
Note: Hearwho's servers are pretty slammed right now. Your text might take a half hour or longer to get converted--which is abnormal.
[via Lifehacker and Techie Portal]
Who doesn't like listening to computer generated human voices for hours at a time? If you're a fan of Microsoft Sam, you should check out Dixero, a service that turns RSS feeds into podcasts you can subscribe to and listen to on your computer or portable devices. The company is showing of its products at this week's Web 2.0 Expo, despite the incredibly noisy show floor.
The listening quality is about the same as Odiogo, a service I looked at a few months back and have used with great success on blogs and news sites that have it integrated. What makes Dixero neat is that you can choose one of three types of voices you'd like to listen to the posts with. It's also nice enough to take your entire OPML file and let you pull in those feeds, then pick which ones you want to fit into individual channels.
The actual player is a little less extensible, offering a simple play/pause button and the option to skip back and forth between posts. You can also grab the RSS feed and subscribe to the feeds as a podcast in your favorite feed catcher. What it's missing is a way to embed it on other sites or swap between the voices--something that's left to the whim of the creator.
Dixero's player lets you skip between blog posts as audio files.
(Credit: CNET Networks)
Here's a neat service for blog owners who want to add another layer of distribution for their content. It's called Odiogo, and it will take any written blog entries and turn them into spoken word. It uses an integrated player that sticks itself on top of every blog post, and lets readers listen to any post in lieu of reading.
I came across the service while reading a post on UNEASYsilence about hacking the new eeePCs to run a hacked version OS X Leopard (which apparently runs about as well as it can on the aged processor), and was treated to a 5-minute computer rendition of step-by-step terminal commands complete with detailed installation instructions. While a bit tedious to listen to after a minute or two (one of several reasons text-to-speech services are still not more widely adopted), Odiogo's digital voice is definitely a step up from the last generation of computer generated speakers.
To actually add the tool to your blog there are plug-ins and bits of JavaScript code site owners can integrate into their blog installation or hosted template. I installed in on a hosted WordPress tester blog in about two minutes and ran into no problems whatsoever. The service was also able to slurp up all 30 or so entries and convert them into spoken text in less than an hour from the time had originally I signed up for the service, which ain't too shabby.
Odiogo will take any text it can pick up from a blog post and crunch it down into spoken words you can listen to right on the blog, or pull down as a podcast to put on your PMP.
(Credit: Odiogo)Once installed, the service will go to work on all of your previously published posts and make new ones available for listening within a few hours. What's more, it'll syndicate all your posts into feeds that can be added to your RSS reader or whatever program you use to pull down podcasts for listening on the go. While it's certainly not as efficient as reading blog posts in Google Reader, you could use the service to listen to your favorite sites on a portable music player while out and about.
Despite its speed and simplicity, the service has a few quirks, not only in the speech department--which still suffers from inflection issues, but also the integrated Flash player. While it's super quick to load and can crunch relatively long posts into small files, there's no volume control slider, so be prepared to turn down your speakers or headphones if you've got your system's volume jacked up.
You can already find Odiogo integrated into several blogs including the aforementioned UNEASYsilence and The Jerusalem Post. To demo the service and find out more about how it works, you can also check out the site's demo page here. Also worth checking out is this handy plugin for Firefox called CLiCK, Speak, which will add text-to-speech on any site you're looking at.
ReQall is a telephone-based service that records notes you speak into it. I saw a demo of this service at Demo 07 and just recently got access to the private beta. It's like an automated secretary: You talk into your phone, and it transcribes what you say and sends it to you as e-mail. The service could be very useful--if it learns to play well with other products.
Talk about a late lunch...
(Credit: CNET Networks)As is, it's still handy. If you're driving and remember something you need to do, you can just speed-dial ReQall on your cell phone, dictate a note, and when you get back to the office you'll see the task in your e-mail. In my tests from a new cell phone, the voice recognition was spot on (I used a Samsung BlackJack, which is known for good voice quality). It took several minutes for voice notes to get transcribed and sent, though, and I didn't try it from a moving car.
And although ReQall knows the difference between tasks, notes, and appointments, it doesn't do much with that information. All voice notes are sent to your e-mail and flagged with their type, but they're not otherwise packaged as appointments or tasks. So, for example, to make a ReQall note like, "Lunch today with Sam," into a meeting, you have to create a new meeting in your PIM and copy the text over. It would be much better if ReQall knew what calendar you used and either sent you the appropriate attachment (for Outlook users) or logged into your Web-based calendar (like Google or Yahoo) and added it for you. Also, note to ReQall programmers: Lunch is usually at noon or thereabouts, not 12 a.m.
Update: I'm told that ReQall's speech-to-text engine isn't wholly automated. "We use a combination of automated speech recognition technology and human transcription," a company co-founder told me. Which means there may be someone listening to your notes and to-do items. Yikes!
Sci-fi time at Demo 07: call in to Qtech's ReQall, speak your ideas and to-do lists, and the service then does smart things with them. For example, if you save a task for a particular date, ReQall will feed it to you when you call in on that future day.
It also does speech-to-text conversion, and can e-mail you your notes, or put them on a desktop widget.
Pretty cool idea. For me, it depends on the speech-to-text quality. Can't wait to try it out.
- prev
- 1
- next





