How voice search technology listens to and spits out information is a complex endeavor, but Google has attempted to explain what mechanisms make its voice search app work in a new research paper it posted earlier today.
Basically, it boils down to data, and lots of it.
According to Google, more data improves all Web services. This may seem obvious, but for better speech recognition, it doesn't only mean a sheer amount of data but also how that data is organized. Google's voice search technology mainly uses data from anonymized queries on Google.com to get the information it needs.
"The language model is the component of a speech recognizer that assigns a probability to the next word in a sentence given the previous ones," Google research scientist Ciprian Chelba wrote in a blog post today about the research. "As an example, if the previous words are 'new york,' the model would assign a higher probability to 'pizza' than say 'granola.'"
In conducting its voice search evaluations, Google scientists used up to 230 billion words from "a random sample of anonymized queries from Google.com that did not trigger spelling correction."
Chelba concludes that with such a big data set, word error rate can be reduced by 6 to 10 percent; and for systems with an even wider range of operating points, word error reduction can be between 17 and 52 percent.
Google's new voice search technology app, which looks to be directly competing with Apple's Siri, launched yesterday. The software went out as part of an update to Google's search application for iOS and provides contextual, spoken results for voice queries, and serves up Web searches for everything else. According to a CNET review, this app gives Siri a run for her money. It is lightening fast, has a clean layout, and gives highly accurate results.