Watson Still Can’t Think
By STANLEY FISHMany who responded to last week’s column about the relationship (if any) of Watson the computer’s performance on “Jeopardy!” to actual human thinking cited recent developments in artificial intelligence research, and wondered what Hubert Dreyfus, professor of philosophy in the graduate school at U.C. Berkeley and author of “What Computers Can’t Do,” would say about this new work. The editors and I have invited him and Sean Dorrance Kelly, professor and chair of the department of philosophy at Harvard University and co-author, with Professor Dreyfus, of “All Things Shining: Reading the Western Classics to Find Meaning in a Secular Age,” to reply. — Stanley Fish
By Sean Dorrance Kelly and Hubert Dreyfus
In last week’s column, Stanley Fish argued that Watson ain’t so smart. Watson, of course, is the IBM-built computer that recently won a game of “Jeopardy!” Despite this impressive achievement, Fish argued that Watson “does not come within a million miles of replicating the achievements of everyday human action and thought.” In defending this claim, Fish invoked arguments that one of us (Dreyfus) articulated almost 40 years ago in “What Computers Can’t Do,” a criticism of 1960s and 1970s style artificial intelligence.
Some of the most interesting comments on Fish’s column pointed out that AI research today is based on a dramatically different paradigm, and that therefore the old arguments against it no longer have any bite. We agree. And we agree also, as a number of other comments suggested, that Watson’s ability to process natural language, to resolve linguistic ambiguities and to evaluate the puns, slang, nuance and oblique allusions characteristic of a typical “Jeopardy!” question or category is impressive indeed. Nevertheless, the arrival of our new computer overlords is not exactly around the corner.
To begin with, consider the way things used to be. At the dawn of the AI era the dominant approach to creating intelligent systems was based on finding the right rules for the computer to follow. If you wanted a computer to play checkers, for instance, then you just needed to code up a system that had rules for how the pieces could move, had a rule about how to rank possible board positions and could generate a tree of possible board positions based on a given initial move. The philosopher John Haugeland, a former student of Dreyfus’s, called this rule-based approach GOFAI, for Good Old Fashioned Artificial Intelligence.
For constrained domains the GOFAI approach is a winning strategy. Already in the early 1960s researchers had developed a system that could play master-level checkers. And a few years ago it was announced that checkers has been solved. Dozens of computers, running almost continuously since 1989, mapped completely the 500 billion billion possible positions in the game of checkers, rendering it no more interesting than tic-tac-toe.
Even Deep Blue, the IBM system that beat the then-reigning world champion chess master Garry Kasparov in 1997, followed something like this brute force strategy. Chess is a much more complicated game than checkers, with an estimated 10∧120 possible positions, so it cannot be solved in the way that checkers has. But Deep Blue could follow out all the consequences of a given move to a depth of 20 or more possible responses, and this proved sufficient. Still, as commenter Bruce Macevoy and others point out, this is nothing like what humans do. And most AI researchers agree that there is nothing intelligent or even interesting about the brute force approach.
Even Deep Blue, the IBM system that beat the then-reigning world champion chess master Garry Kasparov in 1997, followed something like this brute force strategy. Chess is a much more complicated game than checkers, with an estimated 10∧120 possible positions, so it cannot be solved in the way that checkers has. But Deep Blue could follow out all the consequences of a given move to a depth of 20 or more possible responses, and this proved sufficient. Still, as commenter Bruce Macevoy and others point out, this is nothing like what humans do. And most AI researchers agree that there is nothing intelligent or even interesting about the brute force approach.
In many ways the AI community has taken this criticism to heart. AsLisa from Orlando points out, the dominant paradigm in AI research has largely “moved on from GOFAI to embodied, distributed intelligence.” And Faustus from Cincinnati insists that as a result “machines with bodies that experience the world and act on it” will be “able to achieve intelligence.”
The new, embodied paradigm in AI, deriving primarily from the work of roboticist Rodney Brooks, insists that the body is required for intelligence. Indeed, Brooks’s classic 1990 paper, “Elephants Don’t Play Chess,” rejected the very symbolic computation paradigm against which Dreyfus had railed, favoring instead a range of biologically inspired robots that could solve apparently simple, but actually quite complicated, problems like locomotion, grasping, navigation through physical environments and so on. To solve these problems, Brooks discovered that it was actually a disadvantage for the system to represent the status of the environment and respond to it on the basis of pre-programmed rules about what to do, as the traditional GOFAI systems had. Instead, Brooks insisted, “It is better to use the world as its own model.”
The adaptive, responsive, biologically inspired robots that Brooks and others have built may not inhabit the same world that we humans do. And to the extent that they have purposes, projects and expectations they are probably not much like ours. For one thing, although they respond to the physical world rather well, they tend to be oblivious to the global, social moods in which we find ourselves embedded essentially from birth, and in virtue of which things matter to us in the first place. Despite these perhaps insurmountable deficiencies, we find this approach to be a possible step in the right direction. But whether or not it is ultimately successful, the embodied AI paradigm is irrelevant to Watson. After all, Watson has no useful bodily interaction with the world at all.
That does not mean that Watson is completely uninteresting. The statistical machine learning strategies that it uses are indeed a big advance over traditional GOFAI techniques. But they still fall far short of what human beings do. To see this, take Watson’s most famous blunder. On day 2 of the “Jeopardy!” challenge, the competitors were given a clue in the category “U.S. Cities.” The clue was, “Its largest airport is named for a World War II hero; its second largest for a World War II battle.” Both Ken Jennings and Brad Rutter correctly answered Chicago. Watson didn’t just get the wrong city; its answer, Toronto, is not a United States city at all. This is the kind of blunder that practically no human being would make. It may be what motivated Jennings to say, “The illusion is that this computer is doing the same thing that a very good ‘Jeopardy!’ player would do. It’s not. It’s doing something sort of different that looks the same on the surface. And every so often you see the cracks.”
So what went wrong? David Ferrucci, the principal investigator on the IBM team, says that there were a variety of factors that led Watson astray. But one of them relates specifically to the machine learning strategies. During its training phase, Watson had learned that categories are only a weak indicator of the answer type. A clue in the category “U.S. Cities,” for example, might read “Rochester, New York, grew because of its location on this.” The answer, the Erie Canal, is obviously not a U.S. city. So from examples like this Watson learned, all else being equal, that it shouldn’t pay too much attention to mismatches between category and answer type.
The problem is that lack of attention to such a mismatch will sometimes produce a howler. Knowing when it’s relevant to pay attention to the mismatch and when it’s not is trivial for a human being. But Watson doesn’t understand relevance at all. It only measures statistical frequencies. Because it is relatively common to find mismatches of this sort, Watson learns to weigh them as only mild evidence against the answer. But the human just doesn’t do it that way. The human being sees immediately that the mismatch is irrelevant for the Erie Canal but essential for Toronto. Past frequency is simply no guide to relevance.
The fact is, things are relevant for human beings because at root we are beings for whom things matter. Relevance and mattering are two sides of the same coin. As Haugeland said, “The problem with computers is that they just don’t give a damn.” It is easy to pretend that computers can care about something if we focus on relatively narrow domains — like trivia games or chess — where by definition winning the game is the only thing that could matter, and the computer is programmed to win. But precisely because the criteria for success are so narrowly defined in these cases, they have nothing to do with what human beings are when they are at their best.
Far from being the paradigm of intelligence, therefore, mere matching with no sense of mattering or relevance is barely any kind of intelligence at all. As beings for whom the world already matters, our central human ability is to be able to see what matters when. But, as we show in our recent book, this is an existential achievement orders of magnitude more amazing and wonderful than any statistical treatment of bare facts could ever be. The greatest danger of Watson’s victory is not that it proves machines could be better versions of us, but that it tempts us to misunderstand ourselves as poorer versions of them.