Saturday, February 19, 2011

Watson vs. The Champs: Post-match commentary

Samanth Subramanian also notes the alacrity with which Watson got to the buzzer so consistently. He provides some info on how the Jeopardy rules work:

... The Jeopardy buzzer system is set up such that you can’t buzz until Alex has finished reading the entire question in his slow, perennially wry drawl; if you do, you’re locked out, and you have to watch in frustration as somebody else answers. So I figured that computer and humans both had sufficient time to take the question in and work out whether they knew the answer or not. To boot, Watson didn’t buzz electronically; he (it? his highness? Master?) had to physically depress a buzzer like Ken and Brad.

It begins to make sense for me now. When the questions took long for Alex to read, Watson got the end-of-question signal electronically, and probably had an advantage over the humans who got it visually. On the other hand, when the questions were short, the humans had the advantage. I don't know if this interpretation is correct, but take a look at the first part of Day 3. There was one category on "Actors who direct" in which the clues were just movie titles, and therefore, short. And guess what? It was the human contestants who got to the buzzer faster than Watson -- for all the five questions in that category.

There have been some commentary on Watson and what it means for natural language processing. Ben Zimmer's piece at The Atlantic is about the best. Here's an excerpt where he red-flags a bunch of wild claims made in a pre-show hype ad by Dave Ferrucci who led the Watson team.

I first encountered IBM's hype about the tournament last month, during the NFL's conference championship games, when Dave Ferrucci, the ebullient lead engineer on the project, showed up in commercial breaks to tell us about the marvels of Watson. One commercial intriguingly opens with Groucho Marx telling his classic joke: "One morning I shot an elephant in my pajamas. How he got into my pajamas, I don't know." Ferrucci then begins: "Real language is filled with nuance, slang and metaphor. It's more than half the world's data. But computers couldn't understand it." He continues: "Watson is a computer that uncovers meaning in our language, and pinpoints the right answer, instantly. It uses deep analytics to answer questions computers never could before, even the ones on Jeopardy!" Then a Jeopardy! clue is displayed: "Groucho quipped, 'One morning I shot' this 'in my pajamas.'"

Now, that's a provocative set of claims. Watson's performance in the tournament (despite a few howlers along the way) clearly demonstrates that it is very skilled in particular types of question-answering, and I have no doubt it could handle that Groucho clue with aplomb. But does that mean that Watson "understands" the "nuance, slang and metaphor" of natural language? That it "uncovers meaning in our language"? Depends what you mean by "meaning," and how you understand "understanding."

And Zimmer punctures the hype with this verdict:

... [F]or all of the impressive NLP programming that has gone into Watson, the computer is unable to penetrate the semantics of language, or comprehend how meanings of words are shot through with allusions to human culture and the experience of daily life.

Finally, a link to a post-show discussion held at IBM moderated by Stephen Baker, author of Final Jeopardy: Man vs. Machine and the Quest to Know Everything.


  1. Ankur Kulkarni said...

    Thanks for the Zimmer piece. It pretty much answers what I have been sceptical about. Watson is great at playing this game. It would have perhaps won even if it was receiving the question somewhat slower than it was. But from what we have seen it is not clear if it is any good at understanding language and its meaning.

    What would have been interesting is if IBM could have added Watson some capability by which it could chit-chat or banter with other contestants and Alex. That would have given me more confidence about its ability to understand natural language.

  2. Unknown said...

    Regarding the buzzer, I still think Watson had an advantage. Consider this situation: Ken, Rutter and Watson know the answer to the question and are going to press the buzzer when it's time. The variability in Watson's design would probably be lower than Ken or Rutter's. so, while it is possible for Ken or Rutter or somebody else to buzz faster than Watson, if you were to plot a distribution of response times, Watson's would be narrower. Of course, this is only my feeling and one would have to look at actual numbers, but isn't doing a mundane job consistently what machines are supposed to be good at.

    In fact, if you think of it, Ken and Rutter will back off,i.e., be conservative and wait for a fraction of a second after the green light (because if you buzz earlier, you get locked out). Watson need not be this conservative.

    Regarding Ken doing much better at shorter questions, I have a different take. We have seen how the category by itself is not of much use to Watson, (Toronto and US cities). So, shorter questions means less information for Watson to look (commonolaties, repeated occurrences etc.)

    Ken or Rutter or my humble self would have pretty much known that it is going to be definitely about actors who direct (remember Watson might not have been so sure, e.g., Toronto) and then
    mentally prepared a list of directors and actors and movies. Clearly, the intersection of directors and actors and famous movies is a small one. So, Ken is well prepared even before the question is out. Does Watson get prepared too? I haven't seen. So, when the question is out, Ken zeroes in on the answer.

    Of course, this is all conjencture and I know nothing about NLP, question parsing, AI or how Watson does all these things.


  3. Anonymous said...

    I think you are nit-picking here. The important thing here is that Watson gets the correct answer not if it presses the button first. Sooo, even when the humans answer the question, you can see if Watson gets the answer right. To counter your theory, there are few long questions that the humans pressed the buzzer first. Oh, BTW the answers to actors who direct would be a simple index look-up (once you decide which index you are going to use). I think Watson can come up with the answer faster the humans. For me, I am thrilled that Watson got so many answers right (even if it did press the buzzer first). Even with the bloopers would worry me a little, this has got huge potential in a lot of applications (like Knowledge Management, System Integration etc).

  4. Abi said...

    @Raj: The buzzer-related "nit-picking" is a follow-up on an earlier discussion thread, in which I did say this:

    "But this discussion about being first to buzz should not take anything away from the machine's amazing capabilities in parsing natural language. I'm sure you noticed the graphic, throughout the show, with Watson's guesses which were right, like, 80+ percent of the time! Isn't that awesome?"