When you consider mashing up supercomputers and games, there's little doubt that many people think of IBM's Deep Blue Grand Challenge project--which beat world chess champion Garry Kasparov in a famous 1997 showdown--as the standard by which all future projects would be judged.
Now, IBM is trying to outdo itself with Watson, another supercomputer Grand Challenge that, this time, will attempt to beat the world's most successful players of the long-running hit TV game show "Jeopardy."
And while "Jeopardy" might not be the first game show to cross your mind as being worthy of a full-scale four-year IBM Research project, Big Blue thinks that the Alex Trebek-hosted show offers one of the most important natural-language processing challenges it has ever come across.
IBM has been working on--and talking about--the project, code-named Watson (see video below), for some time. And this fall, it conducted dozens of tests, pitting Watson against a series of former "Jeopardy" players to see if it was prepared to take on the best in the world.
And now, IBM has decided Watson is ready. On Monday, the company announced that Watson will take on two of the most successful "Jeopardy" players in history, Ken Jennings and Brad Rutter, next February, in a bid to see if its computer is good enough to beat the best humans at this most abstract of word games, and demonstrate its natural-language processing utility for a wealth of other fields as well.
Yesterday, Watson Research Manager Eric Brown, sat down for a 45 Minutes on IM interview to talk about the program, and to tout the computer's chances of beating "Jeopardy" kingpins like Jennings and Rutter, who between them, won more than $5.75 million.
Q: Well, I want to thank you for taking the time to do this. To start, I wonder if you could quickly sum up the Watson project from your perspective for those readers that aren't familiar with it?
Brown: The Watson project is a Grand Challenge project being pursued by IBM to build a computer system that can compete on "Jeopardy" at the level of a human grand champion. To solve this Grand Challenge, we have built an automatic open domain question answering system, called Watson. Watson is built on top of IBM's DeepQA (for Deep Question Answering) technology.
Why is "Jeopardy" a game worthy of being the follow-up to Deep Blue? I understand why--but I wonder if a lot of people would think of "Jeopardy" being worthy of being put alongside chess as an intellectual challenge.
Brown: Before Deep Blue, people though it was impossible to build a computer system that could beat a grand master at chess, which made that a very interesting Grand Challenge problem. But chess is fairly mathematical and well defined--each game state and the corresponding possible moves can be easily represented by a computer. "Jeopardy" requires understanding natural human language, which, unlike chess, is completely open-ended, is often ambiguous, and requires context to understand. Although humans can easily understand language, building computer systems that understand natural human language is extremely challenging. "Jeopardy" is a fantastic way to push the limits of this technology.
I watched the "Why Jeopardy" video, and I was struck by something someone said--the idea of Don't answer a question if you don't think you've got it right. Does that happen with Watson? And if so, why would it not know the answer?
Brown: That's a key element of "Jeopardy"--if you get the answer wrong, you are penalized and the value of the clue is subtracted from your score--not unlike in business where, if you make wrong decisions with bad information, you will get penalized. This means that not only must Watson come up with the correct answer but also a meaningful confidence in the answer to decide whether or not to even attempt the clue. As to whey Watson would not know the answer, perhaps the question should be, how could Watson know any of the answers?
Here are a few things to consider. First, when playing "Jeopardy," Watson must be completely self contained--it cannot be connected to the Web. All of the content Watson uses to answer questions is identified ahead of time, before seeing the questions.
Second, "Jeopardy" clues can cover any topic. In fact, we analyzed a random sample of 20,000 clues and found 2,500 different kinds of things the clues ask about. With such a broad domain, we couldn't possibly predict every clue "Jeopardy" might ask and build a database of answers. Instead, the DeepQA technology that underpins Watson reads millions of pages of text and uses deep natural language processing techniques to generate candidate answers and evaluate those answers along many different dimensions.
Finally, the "Jeopardy" clues are expressed using complex, often tricky, natural human language. Just understanding what the clue is asking for is a challenge.
Briefly, what is the source of the content Watson uses to answer questions?
Brown: Watson uses encyclopedias, dictionaries, news stories, books, and Web content, among other resources.
So, how did the Watson team decide that Watson was ready to take on the top "Jeopardy" champions?
Brown: Over the last four years of developing Watson, we've evaluated the system in two major ways. First, we run large test sets--say, 3,000 questions--in batch mode to evaluate system performance, conduct error analysis, and improve the system. Results over this many questions give us a statistically significant performance measurement.
The second way we've evaluated Watson is by competing in "sparring" matches against former "Jeopardy" players. Last winter we played 79 games against people that had appeared on "Jeopardy," and this past fall, we played 55 games against Tournament of Champion "Jeopardy" players. These sparring matches have provided a lot of insight into Watson's performance.
How confident are you that Watson can beat the champs? And how surprised would you be if one of the champs came out on top?
Brown: We are very confident that Watson will be competitive. However, the exhibition match is just two games, and anything can happen. Watson (or any player, for that matter) could get [unlucky] with the categories or the Daily Doubles. This is another reason why we played the sparring matches--to create a record over a much larger set of games.
After these large tests you've done, what kind of questions give Watson the hardest time?
Brown: Since we haven't played the final exhibition match yet, I can't give you any specifics. I will say that we're often surprised by some of the clues Watson can get right.
In one of the videos about Watson, I noticed a moment where, when asked to identify two of the men in the R.E.M. song, "It's the end of the world as we know it" with the initials "L.B.," Watson totally misunderstands and responds, "I feel fine." What had to change for it to get past those kinds of basic language misunderstandings?
Brown: The interesting point here is that humans might perceive that as a "basic language misunderstanding," but let's look at what's really going on. That kind of clue is challenging because of the many layers. You have to know the lyrics of the song, know what a "person" is, find the people in the lyrics, know what "initials" are, and match the initials to come up with the answer. This requires complex decomposition and nested processing.
Tell me what's surprised you most about working on this project?
Brown: I think the biggest surprise is how quickly we've been able to push the technology. When we started this project, our state-of-the-art question answering system at the time was nowhere near being competitive at "Jeopardy." Over the last four years, this team has made incredible progress and solved innumerable challenges, from natural language processing algorithms to scale out and latency. Seeing it all come together has really been amazing.
Brown: Another surprising element is the way this challenge has resonated within IBM, with our customers, and with the academic community. People are really drawn to "Jeopardy" as a demonstration of this technology. It has been very rewarding for the entire team.
How can what your team has learned on the Watson project be applied to other real-world projects/problems?
Brown: Watson is an application of underlying technology that supports better decision making by evaluating candidate answers (or "hypotheses") with lots of different evidence and algorithms. We see a number of exciting applications of this approach in areas such as the medical domain, business intelligence, help desks, etc.
Finally (and this is the standard last question in this interview series), I like to do IM interviews for several reasons: it allows my guest to be more thoughtful and articulate than they might be in a phone or in-person interview; I get a perfect transcript; and instant messaging allows for multi-tasking. So, if you don't mind, can you tell me what else you were doing during the interview?
Brown: I had a few IMs from colleagues, and I've talked to a few people that were in and out of the meeting room I'm sitting in. But for the most part I've been focused on this interview.
Excellent. Well, thank you very much for your time. I'm very excited by this project, and I really look forward to seeing how it turns out.
Brown: Great, thanks very much for the opportunity to share this with you and your readers.