AlphaGo, the human Go slayer, has just been demolished by new AI

Isaac Cain
October 20, 2017

To train, AlphaGo Zero played against itself millions of times each day and, within the space of three days, had reached a competency level that could beat the reigning human champion, Lee Sedol, 100pc of the time.

While the new system makes strides against the self-training Big AI Challenge, Windsor expressed doubts that it addresses the third challenge (automated model building) because it used a model already used by the previous version of AlphaGo.

"The most striking thing is we don't need any human data anymore", says Demis Hassabis, CEO and cofounder of DeepMind.

"If we can make the same progress on these problems that we have with AlphaGo, it has the potential to drive forward human understanding and positively impact all of our lives".

Singh does not think we should be anxious about the increasing abilities of artificial intelligence compared to what we can do as humans. The AI had been trained using a combination of supervised learning, based on millions of human moves, both amateur and expert, and reinforcing what it had leant by playing against itself. Overall, the total number of different arrangements for the board is in the order of 10^170, more than the total number of atoms in the observable universe. Chess, by comparison, has a puny 10 possible games. It is actually way more than that. Eventually, it figured out which were the best "winning moves" for most situations, and it would play those against any human player. There are far too many possibilities to tackle by the brute force processing that computers are naturally good at.

21 days later it had bested AlphaGo Master, the version who recently defeated the current champion Kie Je and 60 other Go professionals.

More news: Taliban attack kills dozens of soldiers in Kandahar

The efficiency of the learning process owes to a feedback loop. Instead, it selectively prunes branches by deciding which paths seem most promising. It proceeds using reinforcement learning and by combining its neural network with a powerful search algorithm. Its training took several months.

The latest iteration, however, differs from its predecessors: AlphaGo Zero abandons all hand-engineered features, runs only one neural network (versus the two found in earlier models), and relies exclusively on its own knowledge to evaluate positions. It is much more sophisticated in how its neural network is tuned and updated to predict moves and the eventual victor of the games. "The architecture is simpler, yet more powerful, than previous versions", he says.

"Now we have the final version of AlphaGo, AlphaGo Zero, which has learned completely from scratch, from first principles", Professor Silver said in a company video. This involved feeding it just the rules of the ancient Chinese game and let it figure out how to play. "As a result, a long-standing ambition of AI research is to bypass this step, creating algorithms that achieve superhuman performance in the most challenging domains with no human input". Zero only understood that concept later in its training, according to DeepMind's paper.

The program started out tossing stones on the board at random. It formed its own strategies and optimized an outcome without studying prior examples. But by the 70-hour mark, it had developed a mature, sophisticated style.

Mok has reportedly already begun analyzing the playing style of AlphaGo Zero along with players from the national team. It started off trying greedily to capture stones, as beginners often do, but after three days it had mastered complex tactics used by human experts.

Other reports by LeisureTravelAid

Discuss This Article