The newest AlphaGo mastered the game with no human input

The computer program even devised new strategies previously unknown to human players

GAME CHANGER The first completely self-taught AlphaGo program tops all of its machine-learning predecessors, no contest.

DeepMind

By Maria Temming

October 18, 2017 at 1:00 pm

AlphaGo just leveled up.

The latest version of the computer program, dubbed AlphaGo Zero, is the first to master Go, a notoriously complex Chinese board game, without human guidance. Its predecessor — dubbed AlphaGo Lee when it became the first computer program with artificial intelligence, or AI, to defeat a human world champion Go player (SN Online: 3/15/16) — had to study millions of examples of human expert moves before playing practice games against itself. AlphaGo Zero trained solely through self play, starting with completely random moves. After a few days’ practice, AlphaGo Zero trounced AlphaGo Lee 100 games to none, researchers report in the Oct. 19 Nature.

“The results are stunning,” says Jonathan Schaeffer, a computer scientist at the University of Alberta in Edmonton, Canada, who wasn’t involved in the work. “We’re talking about a revolutionary change.”

AI programs like AlphaGo Zero that can gain mastery of various tasks without human input may be able to solve problems where human expertise falls short, says Satinder Singh, a computer scientist at the University of Michigan in Ann Arbor. For instance, computer programs with superhuman smarts could find new cures for diseases, design more energy-efficient technology or invent new materials.

AlphaGo Zero’s creators at Google DeepMind designed the computer program to use a tactic during practice games that AlphaGo Lee didn’t have access to. For each turn, AlphaGo Zero drew on its past experience to predict the most likely ways the rest of the game could play out, judge which player would win in each scenario and choose its move accordingly.

AlphaGo Lee used this kind of forethought in matches against other players, but not during practice games. AlphaGo Zero’s ability to imagine and assess possible futures during training “allowed it to train faster, but also become a better player in the end,” explains Singh, whose commentary on the study appears in the same issue of Nature.

AlphaGo Zero played 4.9 million practice games over three days before roundly defeating AlphaGo Lee. In comparison, AlphaGo Lee’s training period took several months (SN: 12/24/16, p. 28). While practicing, AlphaGo Zero not only discovered many of the Go strategies that humans have come up with over thousands of years, but also devised new game plans previously unknown to human players.

“To AlphaGo Zero, the world human champion is a novice,” Schaeffer says. But despite its incredible Go-playing prowess, AlphaGo Zero is still “an idiot savant” that can’t do anything except play Go, he says. If AI programs are going to make superhuman contributions to engineering or medicine, they’ll have to be more general-purpose problem-solvers that can teach themselves a wide variety of tasks.