Documentos de Académico
Documentos de Profesional
Documentos de Cultura
ALPHAZERO
1. Intro ................................................................................................................. 3
1.1. About DeepMind and earlier versions of AlphaZero ............................... 3
1.2. Why chess? ............................................................................................... 4
1.3. How everything started ............................................................................ 4
2. About AlphaZero ............................................................................................ 5
2.1. Computing power ..................................................................................... 5
2.2. How it learns ............................................................................................ 5
2.2.1. Neural networks ................................................................................. 6
2.3. Win against Stockfish and Elmo .............................................................. 7
3. Generalisation ................................................................................................. 8
2
1. Intro
3
1.2. Why chess?
The game of chess is the most widely-studied domain in the history of artificial
intelligence. The study of computer chess is as old as computer science itself.
Charles Babbage (also known as father of the computer), Alan Turing (father of
theoretical computer science and artificial intelligence), Claude Shannon (founder
of information theory), and John von Neumann (creator of von Neuman model that
is the basis of most modern computer designs) devised hardware, algorithms and
theory to analyse and play the game of chess.
4
2. About AlphaZero
When IBM’s supercomputer Deep Blue beat Gary Kasparov in 1997, it was
because it had been instructed with the best moves. But AlphaZero has learned
completely on its own.
DeepMind said the difference between AlphaZero and its rivals is that its machine-
learning approach is given no human input apart from the basic rules of chess.
One of the key advances here is that the AlphaZero wasn’t specifically designed
to play any of these games. In each case, it was given some basic rules (like how
queen moves in chess, and so on) but was programmed with no other strategies or
tactics.
5
Four hours and 44 million games of split-personality chess later, AlphaZero had
taught itself enough to become the greatest chess player and it exceed Stockfish’s
rating.
Neural network is making a computer system more like the human brain. The
current position on the chessboard, comes in on the left. It gets processed by the
first layer of neurons, each of which then sends its output to each neuron in the
next layer until they produce the final output.
Neuron is very simple processing unit that accepts a number of inputs, multiplies
each one by a particular weight, sums the answers and then applies an activation
function that gives an output in the range of 0 to 1.
The architecture of the AlphaZero program is based on an interaction of two neural
networks, a "policy network" to define candidate moves, and a "value network" to
evaluate positions.
AlphaZero’s neural network has up to 80 layers, and hundreds of thousands of
neurons.
6
2.3. Win against Stockfish and Elmo
In order to prove the superiority of AlphaZero over previous chess engines, a 100-
game match against Stockfish was played. The selection of Stockfish as the rival
chess engine seems reasonable, being open-source and one of the strongest chess
engines nowadays.
Stockfish (open-source chess engine) won the 2016 TCEC Championship and the
2017 Chess.com Computer Chess Championship, didn't stand a chance.
AlphaZero won the closed-door, 100-game match with 28 wins, 72 draws, and
zero losses.
What do you do if you are a thing that never tires and you just mastered a 1400-
year-old game? You conquer another one. After the Stockfish match, AlphaZero
then "trained" for only two hours and then beat the best Shogi-playing computer
program "Elmo."
7
3. Generalisation
The use of a general-purpose learning that can work in many domains is one of
the main claims in AlphaZero.
DeepMind eventually wants to use the algorithm to solve health problems. They
believe that the algorithm could come up with cures for major illness in a matter
of days or weeks, which would have taken humans hundreds of years to find.
The company has already begun using AlphaZero to study protein folding and has
promised it will soon publish new findings. Misfolded proteins are responsible for
many devastating diseases, including Alzheimer’s, Parkinson’s and cystic fibrosis.
It seems unrealistic to think that many situations in real-life can be simplified to a
fixed predefined set of rules, as it is the case of chess, Go or Shogi.
8
References
[1] Silver et al. “Mastering Chess and Shogi by Self-Play with a General
Reinforcement Learning Algorithm.” arXiv preprint arXiv:1712.01815
(2017). https://arxiv.org/pdf/1712.01815.pdf
[2] https://en.wikipedia.org/wiki/Deep_Blue_versus_Garry_Kasparov
[2] https://www.theguardian.com/technology/2016/mar/15/googles-
alphago-seals-4-1-victory-over-grandmaster-lee-sedol
[4] https://www.theguardian.com/technology/2017/dec/07/alphazero-
google-deepmind-ai-beats-champion-program-teaching-itself-to-play-
four-hours
[5] https://www.chess.com/news/view/google-s-alphazero-destroys-
stockfish-in-100-game-match