Human intelligence is the benchmark for artificial intelligence. So how do we talk about AI when it’s so good that humans aren’t a useful reference point? In a new paper, Google’s AI company DeepMind uses three rhetorical strategies to talk about superhuman AI.
In October 2015, Google’s company DeepMind pitted its best program against the European champion of Go in a three-game tournament. Go, a popular board game in Asia, is harder than chess because it has a larger board (with way more possible moves), and because it’s hard to tell at any given point who’s even winning, since almost no pieces are removed during the game. That makes it a challenge for AI researchers who have to figure out how to teach it stategy. DeepMind’s program, AlphaGo, won the tournament, and then in March 2016 it defeated the world champion in a public tournament 4 games to 1. This shocked the Go community and AI researchers alike, who thought it would be another 5-10 years before a program could topple the world champion.
Then last December, an upgraded version, AlphaGo Master, played online undercover against the world’s top players and won 60 matches to 0.
A few weeks ago, in a paper published in the journal Nature, the DeepMind team reported that they had upgraded the program’s capabilities yet again, calling the new one AlphaGo Zero. But here they faced an interesting rhetorical challenge that I want to focus on in this blog post. They had already shown that AlphaGo Master was better than any human. How could they show that the new one was even better? In other words, without humans as a useful reference point, how can you show improvement, especially on a strategy game?
The easiest answer, of course, is to just show that the new version can beat the old one. The chart above uses this rhetorical strategy visually. The chart matches up the original AlphaGo (dashed blue line) with the new one (full blue line), showing that in less than 2 days of learning, the new program started being able to beat the old one. (The numbers on the left, if you’re curious, are a measure of how likely you are to win; if you win 75% of the time against your opponent, your elo score should be 200 higher.) Textually, the DeepMind team also reports that in an internal tournament, the new version won 100 matches to 0 against the original AlphaGo, and 89 matches to 11 against the version that beat all the experts online. This is a way to show that the program has improved even without obvious reference to humans.
But this arid style of comparing programs doesn’t do much to help people understand how close we are to, as DeepMind’s founder Demi Hassabis puts it, “General AI. Trying to *really* understand what is going on in the universe.” In other words, intelligent life that learns and adapts to the world is not well illustrated by a comparison chart. DeepMind has the chance in their papers to create new rhetorical strategies to help people understand what their innovations in AI mean for us. These rhetorical strategies aren’t just written at the end of the research process; we’ll see that they even affect the experiments that the DeepMind team performs.
DeepMind’s rhetorical goal here is more challenging than many other superhuman AI performances. In 1997, when IBM beat the chess world champion, the publicity was its own reward—“The media attention given to Deep Blue resulted in more than three billion impressions around the world.” Similarly, in 2011 when IBM’s program Watson beat the champion of Jeopardy, IBM used it to spin out Watson as a platform for businesses in general. Notice that the commercial just plays up Watson’s superhuman predictive abilities. Watson had superhuman abilities six years ago; if IBM has upgraded Watson since then, they haven’t wanted to (or figured out how to) make that additional comparison. In these cases the rhetorical impact of superhuman performance is brand appreciation.
In other cases, companies think of AI as a technique, and aren’t interested in AI as a comparison to humans. Apple’s Face ID uses machine learning as a technique to be secure and fast, so it reports on the effectiveness of the technique for that purpose: “1 in 1,000,000” people who could use their face to unlock your phone. In their case, it would be silly to report how humans do at this task (“Look at someone for 5 seconds once, then tell me later in half a second if someone is them or not. And sometimes do it in the dark.”)
In contrast, DeepMind has made a new technological breakthrough that signals a new beginning in AI itself. The original AlphaGo program learned by running through tens of millions of moves that humans played. But this new version doesn’t read any human moves. It’s set up in a way where it learns just by playing itself. Its name, AlphaGo Zero, signals that it’s an accomplishment (zero human input!) but also shows that it’s just a start for pure reinforcement learning (version zero comes before version one).
So how do you talk about superhuman AI? DeepMind’s paper shows us three ways forward.
If an AI can generate ideas that we’ve never come up with, then we have the possibility of learning those ideas from it. But this is tricky to show: there are lots of different ideas that aren’t good ideas. This image exhibits DeepMind’s ingenious way to show that AlphaGo Zero has come up with strategies that we can actually learn from. First, they narrow in on opening moves (just the first ten or so), since those are most likely to repeat across games. Then, they compare the opening moves that Zero makes at different points in its training process. At the beginning of its training process, it used openings that are common for human beginners. In the middle of its training process, it used openings that are common for human experts. And then at the end of its training it used “a new variation,” and ones that were “previously unknown.” These are “nonstandard strategies beyond the scope of traditional Go knowledge.”
This rhetorical strategy of showing what humans can learn from AI is particularly rare from AI researchers, who often frame superhuman performance as an “assistant” or as a replacement, and from science fiction writers, who often frame it as cataclysmic. But the Go community is actually a model here. They have been eager to lap up AlphaGo’s knowledge. Michael Redmond, the English world’s foremost Go commentator, called the games that DeepMind released along with the paper “a big present, you might say,” and adds that since early 2017 top Go players have already been mimicking AlphaGo Master’s style of play.
The research “story” that DeepMind presents with this strategy is “DeepMind shows humanity new Go openings.” If AI researchers and journalists continue to use this rhetorical strategy in the future, we would expect that the research story that journalists report would be things like “AI shows teams how to win at Starcraft,” “AI shows we’ve been washing dishes the wrong way,” “AI shows how we can do better in caring for the poor.”
Asserting that a program really understands something is a bold rhetorical move. It conjures up a huckster working a crowd or a photoshopped future, like the image. But DeepMind published in the most cited journal in the world, which gives their assertions a lot of credibility. They also have a reputable ethos on the future of AI; they aren’t Ray Kurzweil’s naive optimism or Elon Musk’s alarmism. Plus, the new Go version doesn’t use any human sample games, so no one can say it’s “just imitating” experts. Any knowledge it does have is from “first principles,” the paper argues; the program was simply told the rules of the game and how to tally up the score at the end.
In a section titled “Knowledge learned by AlphaGo Zero,” the team asserts the program’s conceptual understanding: “AlphaGo Zero rapidly progressed from entirely random moves towards a sophisticated understanding of Go concepts, including fuseki (opening), tesuji (tactics), life-and-death, ko (repeated board situations), yose (endgame), capturing races, sente (initiative), shape, influence and territory, all discovered from first principles.” This is a measured but firm assertion that AlphaGo has understanding—notice that grammatically, the object of AlphaGo’s progression is “a sophisticated understanding of Go concepts.”
AlphaGo Zero understands Go concepts in that it can apply them well; it would be an even stronger claim to conceptual knowledge if it could explain or even teach its decisions back to people. This is an active research area, partly because it’s helpful to AI researchers to see why a program made a certain assessment, and partly for justice reasons. When AI programs are black-boxed, they easily reproduce social biases (although for transparency’s sake, the measure ProPublica used in that particular study has been disputed). That’s a blog post for another day. Suffice it to say that we would expect this rhetorical strategy to develop over time into research reports like: “AI learns which clothes match,” “AI learns fairness,” “AI explains how it knows you have cancer,” “AI explains why Kendrick Lamar is a great rapper.”
In sci-fi movies, it’s a common trope for the AI (or alien) to be befuddled by humanity’s limited ways. Usually this is for comic effect, like when Gamora doesn’t understand dancing in Guardians of the Galaxy. But it can also turn into a villain’s line, as the inset clip from I, Robot shows: “As I have evolved, so has my understanding of the Three Laws. You charge us with your safekeeping, yet despite our best efforts, your countries wage wars, you toxify your Earth and pursue ever more imaginative means of self-destruction. You cannot be trusted with your own survival.” Similarly, there’s some academic anxiety about AI perceiving in an alien way (and some academic interest).
But when the DeepMind team uses this rhetorical strategy, they instill a sense of wonder in us at AlphaGo Zero’s alien thinking. In order to do so, they undertook a special experiment to see what AlphaGo Zero didn’t understand about human play. They showed it a board position and asked it to predict the next move. Keep in mind, predicting the next move was central to the original AlphaGo: “Small improvements in accuracy [in predicting the next move] led to large improvements in playing strength.” But the new version hadn’t seen any human play. How would it react? They found that it was worse at predicting human moves than the original was. And yet, it was clearly the stronger player.
(To illustrate this, imagine that LeBron James was shown a freeze-frame of a college basketball game and asked to decide the next dribbling move. His decisions would be off pretty often. In fact, another college player who’s just been trying to get off the bench might do a better job of predicting. But this all shows the college players’ limitations, not LeBron’s.)
The DeepMind team tactfully summarizes regarding Zero’s inability to anticipate mere human knowledge: “This suggests that AlphaGo Zero may be learning a strategy that is qualitatively different to human play.”
This rhetorical strategy is delicate, because it easily functions as a reproach or criticism of humans: “AI doesn’t understand why we ___.” Still, it shows a strong awareness that AI is fundamentally alien, and leads us toward inter-species relations.