In October 2015, AlphaGo became the first computer Go program to beat a human professional Go player without handicaps on a full-sized 19×19 board. In March 2016, it beat Lee Sedol in a five-game match. At the 2017 Future of Go Summit, AlphaGo beat Ke Jie, the world No.1 ranked player at the time, in a three-game match.
Go was considered a difficult game for computers to master because, besides being complex, the number of possible moves – more than chess at 10170 – is greater than the number of atoms in the universe.
After beating Jie earlier this year, DeepMind announced AlphaGo was retiring from future competitions. And just a week earlier, DeepMind published a paper describing AlphaGo Zero – a leaner and meaner version of AlphaGo, the artificially intelligent program that crushed professional Go players.
I bet you will wonder what is the advance of AlphaGo Zero instead of AlphaGo? Why can it be called as BREAKTHROUGH of AI?
The answer here is SELF-LEARNING.
Previous versions of AlphaGo initially trained on thousands of human amateur and professional games to learn how to play Go. AlphaGo Zero skips this step and learns to play simply by playing games against itself, starting from completely random play. In doing so, it quickly surpassed human-level of play and defeated the previously published champion-defeating version of AlphaGo by 100 games to 0.
DeepMind used a novel form of reinforcement learning, in which AlphaGo Zero becomes its own teacher. You might say “Okay, AlphaGo Zero can teach and itself, why is this so great?”
Notice that the new technique makes AlphaGo Zero no longer constrained by the limits of human knowledge. Also, do you feel it is just like how human brain works? While you are in a new environment, you start to feel it, get knowledge from it, and teach yourself to understand. The new technique, in fact, is a sign of AI getting closer to human brains.
In addition, AlphaGo Zero uses one neural network rather than two. Earlier versions of AlphaGo used a “policy network” to select the next move to play and a ”value network” to predict the winner of the game from each position. These are combined with the search algorithm in AlphaGo Zero, allowing it to be trained and evaluated more efficiently.
The logic behind AlphaGo Zero imitates how humans think and learn, which makes it more general – the purpose of studying AI.
We want AI to help humans at a huge range of tasks which can be housework, driving, laundry in a self-aspect, or financial work, supply chain support at the industry level. While AlphaGo Zero is a step towards a general-purpose AI, it can only work on problems that can be perfectly simulated in a computer. Right now, people are only researching AI in a one by one area. AIs that match humans at a huge range of tasks are still a long way off.
We have been discussing a lot about whether AI would be a threat to human beings in the class. Everyone gets his/her own opinions. For me, I still believe the benefits that we get from AI would be much more than its threats to us. And AlphaGo Zero does bring us a big surprise that how fast the technology develops and how great humans have been developing.