Mastering the game of Go with deep neural networks and tree search

Silver, David; Huang, Aja; Maddison, Chris J.; Guez, Arthur; Sifre, Laurent; van den Driessche, George; Schrittwieser, Julian; Antonoglou, Ioannis; Panneershelvam, Veda; Lanctot, Marc; Dieleman, Sander; Grewe, Dominik; Nham, John; Kalchbrenner, Nal; Sutskever, Ilya; Lillicrap, Timothy; Leach, Madeleine; Kavukcuoglu, Koray; Graepel, Thore; Hassabis, Demis

doi:10.1038/nature16961

Article
Published: 27 January 2016

Mastering the game of Go with deep neural networks and tree search

David Silver¹^na1,
Aja Huang¹^na1,
Chris J. Maddison¹,
Arthur Guez¹,
Laurent Sifre¹,
George van den Driessche¹,
Julian Schrittwieser¹,
Ioannis Antonoglou¹,
Veda Panneershelvam¹,
Marc Lanctot¹,
Sander Dieleman¹,
Dominik Grewe¹,
John Nham²,
Nal Kalchbrenner¹,
Ilya Sutskever²,
Timothy Lillicrap¹,
Madeleine Leach¹,
Koray Kavukcuoglu¹,
Thore Graepel¹ &
…
Demis Hassabis¹

Nature volume 529, pages 484–489 (2016)Cite this article

476k Accesses
8367 Citations
3083 Altmetric
Metrics details

Subjects

Abstract

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Neural network training pipeline and architecture.**

**Figure 2: Strength and accuracy of policy and value networks.**

**Figure 3: Monte Carlo tree search in AlphaGo.**

**Figure 4: Tournament evaluation of AlphaGo.**

**Figure 5: How AlphaGo (black, to play) selected its move in an informal game against Fan Hui.**

**Figure 6: Games from the match between AlphaGo and the European champion, Fan Hui.**

Mastering Atari, Go, chess and shogi by planning with a learned model

Article 23 December 2020

Julian Schrittwieser, Ioannis Antonoglou, … David Silver

Using deep neural networks as a guide for modeling human planning

Article Open access 20 November 2023

Ionatan Kuperwajs, Heiko H. Schütt & Wei Ji Ma

Catalyzing next-generation Artificial Intelligence through NeuroAI

Article Open access 22 March 2023

Anthony Zador, Sean Escola, … Doris Tsao

References

Allis, L. V. Searching for Solutions in Games and Artificial Intelligence. PhD thesis, Univ. Limburg, Maastricht, The Netherlands (1994)
van den Herik, H., Uiterwijk, J. W. & van Rijswijck, J. Games solved: now and in the future. Artif. Intell. 134, 277–311 (2002)
Article MATH Google Scholar
Schaeffer, J. The games computers (and people) play. Advances in Computers 52, 189–266 (2000)
Article Google Scholar
Campbell, M., Hoane, A. & Hsu, F. Deep Blue. Artif. Intell. 134, 57–83 (2002)
Article MATH Google Scholar
Schaeffer, J. et al. A world championship caliber checkers program. Artif. Intell. 53, 273–289 (1992)
Article Google Scholar
Buro, M. From simple features to sophisticated evaluation functions. In 1st International Conference on Computers and Games, 126–145 (1999)
Müller, M. Computer Go. Artif. Intell. 134, 145–179 (2002)
Article MATH Google Scholar
Tesauro, G. & Galperin, G. On-line policy improvement using Monte-Carlo search. In Advances in Neural Information Processing, 1068–1074 (1996)
Sheppard, B. World-championship-caliber Scrabble. Artif. Intell. 134, 241–275 (2002)
Article MATH Google Scholar
Bouzy, B. & Helmstetter, B. Monte-Carlo Go developments. In 10th International Conference on Advances in Computer Games, 159–174 (2003)
Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree search. In 5th International Conference on Computers and Games, 72–83 (2006)
Kocsis, L. & Szepesvári, C. Bandit based Monte-Carlo planning. In 15th European Conference on Machine Learning, 282–293 (2006)
Coulom, R. Computing Elo ratings of move patterns in the game of Go. ICGA J. 30, 198–208 (2007)
Article Google Scholar
Baudiš, P. & Gailly, J.-L. Pachi: State of the art open source Go program. In Advances in Computer Games, 24–38 (Springer, 2012)
Müller, M., Enzenberger, M., Arneson, B. & Segal, R. Fuego – an open-source framework for board games and Go engine based on Monte-Carlo tree search. IEEE Trans. Comput. Intell. AI in Games 2, 259–270 (2010)
Article Google Scholar
Gelly, S. & Silver, D. Combining online and offline learning in UCT. In 17th International Conference on Machine Learning, 273–280 (2007)
Krizhevsky, A., Sutskever, I. & Hinton, G. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 1097–1105 (2012)
Lawrence, S., Giles, C. L., Tsoi, A. C. & Back, A. D. Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Netw. 8, 98–113 (1997)
Article CAS PubMed Google Scholar
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article ADS CAS PubMed Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015)
Article ADS CAS PubMed Google Scholar
Stern, D., Herbrich, R. & Graepel, T. Bayesian pattern ranking for move prediction in the game of Go. In International Conference of Machine Learning, 873–880 (2006)
Sutskever, I. & Nair, V. Mimicking Go experts with convolutional neural networks. In International Conference on Artificial Neural Networks, 101–110 (2008)
Maddison, C. J., Huang, A., Sutskever, I. & Silver, D. Move evaluation in Go using deep convolutional neural networks. 3rd International Conference on Learning Representations (2015)
Clark, C. & Storkey, A. J. Training deep convolutional neural networks to play go. In 32nd International Conference on Machine Learning, 1766–1774 (2015)
Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992)
MATH Google Scholar
Sutton, R., McAllester, D., Singh, S. & Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, 1057–1063 (2000)
Sutton, R. & Barto, A. Reinforcement Learning: an Introduction (MIT Press, 1998)
Schraudolph, N. N., Dayan, P. & Sejnowski, T. J. Temporal difference learning of position evaluation in the game of Go. Adv. Neural Inf. Process. Syst. 6, 817–824 (1994)
Google Scholar
Enzenberger, M. Evaluation in Go by a neural network using soft segmentation. In 10th Advances in Computer Games Conference, 97–108 (2003). 267
Silver, D., Sutton, R. & Müller, M. Temporal-difference search in computer Go. Mach. Learn. 87, 183–219 (2012)
Article MathSciNet MATH Google Scholar
Levinovitz, A. The mystery of Go, the ancient game that computers still can’t win. Wired Magazine (2014)
Mechner, D. All Systems Go. The Sciences 38, 32–37 (1998)
Article Google Scholar
Mandziuk, J. Computational intelligence in mind games. In Challenges for Computational Intelligence, 407–442 (2007)
Berliner, H. A chronology of computer chess and its literature. Artif. Intell. 10, 201–214 (1978)
Article MATH Google Scholar
Browne, C. et al. A survey of Monte-Carlo tree search methods. IEEE Trans. Comput. Intell. AI in Games 4, 1–43 (2012)
Article Google Scholar
Gelly, S. et al. The grand challenge of computer Go: Monte Carlo tree search and extensions. Commun. ACM 55, 106–113 (2012)
Article Google Scholar
Coulom, R. Whole-history rating: A Bayesian rating system for players of time-varying strength. In International Conference on Computers and Games, 113–124 (2008)
KGS. Rating system math. http://www.gokgs.com/help/rmath.html
Littman, M. L. Markov games as a framework for multi-agent reinforcement learning. In 11th International Conference on Machine Learning, 157–163 (1994)
Knuth, D. E. & Moore, R. W. An analysis of alpha-beta pruning. Artif. Intell. 6, 293–326 (1975)
Article MathSciNet MATH Google Scholar
Sutton, R. Learning to predict by the method of temporal differences. Mach. Learn. 3, 9–44 (1988)
Google Scholar
Baxter, J., Tridgell, A. & Weaver, L. Learning to play chess using temporal differences. Mach. Learn. 40, 243–263 (2000)
Article MATH Google Scholar
Veness, J., Silver, D., Blair, A. & Uther, W. Bootstrapping from game tree search. In Advances in Neural Information Processing Systems (2009)
Samuel, A. L. Some studies in machine learning using the game of checkers II - recent progress. IBM J. Res. Develop. 11, 601–617 (1967)
Article Google Scholar
Schaeffer, J., Hlynka, M. & Jussila, V. Temporal difference learning applied to a high-performance game-playing program. In 17th International Joint Conference on Artificial Intelligence, 529–534 (2001)
Tesauro, G. TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994)
Article Google Scholar
Dahl, F. Honte, a Go-playing program using neural nets. In Machines that learn to play games, 205–223 (Nova Science, 1999)
Rosin, C. D. Multi-armed bandits with episode context. Ann. Math. Artif. Intell. 61, 203–230 (2011)
Article MathSciNet MATH Google Scholar
Lanctot, M., Winands, M. H. M., Pepels, T. & Sturtevant, N. R. Monte Carlo tree search with heuristic evaluations using implicit minimax backups. In IEEE Conference on Computational Intelligence and Games, 1–8 (2014)
Gelly, S., Wang, Y., Munos, R. & Teytaud, O. Modification of UCT with patterns in Monte-Carlo Go. Tech. Rep. 6062, INRIA (2006)
Google Scholar
Silver, D. & Tesauro, G. Monte-Carlo simulation balancing. In 26th International Conference on Machine Learning, 119 (2009)
Huang, S.-C., Coulom, R. & Lin, S.-S. Monte-Carlo simulation balancing in practice. In 7th International Conference on Computers and Games, 81–92 (Springer-Verlag, 2011)
Baier, H. & Drake, P. D. The power of forgetting: improving the last-good-reply policy in Monte Carlo Go. IEEE Trans. Comput. Intell. AI in Games 2, 303–309 (2010)
Article Google Scholar
Huang, S. & Müller, M. Investigating the limits of Monte-Carlo tree search methods in computer Go. In 8th International Conference on Computers and Games, 39–48 (2013)
Segal, R. B. On the scalability of parallel UCT. Computers and Games 6515, 36–47 (2011)
Article MathSciNet MATH Google Scholar
Enzenberger, M. & Müller, M. A lock-free multithreaded Monte-Carlo tree search algorithm. In 12th Advances in Computer Games Conference, 14–20 (2009)
Huang, S.-C., Coulom, R. & Lin, S.-S. Time management for Monte-Carlo tree search applied to the game of Go. In International Conference on Technologies and Applications of Artificial Intelligence, 462–466 (2010)
Gelly, S. & Silver, D. Monte-Carlo tree search and rapid action value estimation in computer Go. Artif. Intell. 175, 1856–1875 (2011)
Article MathSciNet Google Scholar
Baudiš, P. Balancing MCTS by dynamically adjusting the komi value. ICGA J. 34, 131 (2011)
Article Google Scholar
Baier, H. & Winands, M. H. Active opening book application for Monte-Carlo tree search in 19×19 Go. In Benelux Conference on Artificial Intelligence, 3–10 (2011)
Dean, J. et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems, 1223–1231 (2012)
Go ratings. http://www.goratings.org

Download references

Acknowledgements

We thank Fan Hui for agreeing to play against AlphaGo; T. Manning for refereeing the match; R. Munos and T. Schaul for helpful discussions and advice; A. Cain and M. Cant for work on the visuals; P. Dayan, G. Wayne, D. Kumaran, D. Purves, H. van Hasselt, A. Barreto and G. Ostrovski for reviewing the paper; and the rest of the DeepMind team for their support, ideas and encouragement.

Author information

David Silver and Aja Huang: These authors contributed equally to this work.

Authors and Affiliations

Google DeepMind, 5 New Street Square, London, EC4A 3TW, UK
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, Nal Kalchbrenner, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel & Demis Hassabis
Google, 1600 Amphitheatre Parkway, Mountain View, California, 94043, USA
John Nham & Ilya Sutskever

Authors

David Silver
View author publications
You can also search for this author in PubMed Google Scholar
Aja Huang
View author publications
You can also search for this author in PubMed Google Scholar
Chris J. Maddison
View author publications
You can also search for this author in PubMed Google Scholar
Arthur Guez
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Sifre
View author publications
You can also search for this author in PubMed Google Scholar
George van den Driessche
View author publications
You can also search for this author in PubMed Google Scholar
Julian Schrittwieser
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Antonoglou
View author publications
You can also search for this author in PubMed Google Scholar
Veda Panneershelvam
View author publications
You can also search for this author in PubMed Google Scholar
Marc Lanctot
View author publications
You can also search for this author in PubMed Google Scholar
Sander Dieleman
View author publications
You can also search for this author in PubMed Google Scholar
Dominik Grewe
View author publications
You can also search for this author in PubMed Google Scholar
John Nham
View author publications
You can also search for this author in PubMed Google Scholar
Nal Kalchbrenner
View author publications
You can also search for this author in PubMed Google Scholar
Ilya Sutskever
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Lillicrap
View author publications
You can also search for this author in PubMed Google Scholar
Madeleine Leach
View author publications
You can also search for this author in PubMed Google Scholar
Koray Kavukcuoglu
View author publications
You can also search for this author in PubMed Google Scholar
Thore Graepel
View author publications
You can also search for this author in PubMed Google Scholar
Demis Hassabis
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.H., G.v.d.D., J.S., I.A., M.La., A.G., T.G. and D.S. designed and implemented the search in AlphaGo. C.J.M., A.G., L.S., A.H., I.A., V.P., S.D., D.G., N.K., I.S., K.K. and D.S. designed and trained the neural networks in AlphaGo. J.S., J.N., A.H. and D.S. designed and implemented the evaluation framework for AlphaGo. D.S., M.Le., T.L., T.G., K.K. and D.H. managed and advised on the project. D.S., T.G., A.G. and D.H. wrote the paper.

Corresponding authors

Correspondence to David Silver or Demis Hassabis.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Table 1 Details of match between AlphaGo and Fan Hui

Full size table

Extended Data Table 2 Input features for neural networks

Full size table

Extended Data Table 3 Supervised learning results for the policy network

Full size table

Extended Data Table 4 Input features for rollout and tree policy

Full size table

Extended Data Table 5 Parameters used by AlphaGo

Full size table

Extended Data Table 6 Results of a tournament between different Go programs

Full size table

Extended Data Table 7 Results of a tournament between different variants of AlphaGo

Full size table

Extended Data Table 8 Results of a tournament between AlphaGo and distributed AlphaGo, testing scalability with hardware

Full size table

Extended Data Table 9 Cross-table of win rates in per cent between programs

Full size table

Extended Data Table 10 Cross-table of win rates in per cent between programs in the single-machine scalability study

Full size table

Extended Data Table 11 Cross-table of win rates in per cent between programs in the distributed scalability study

Full size table

Related audio

Hear from the makers of the AI that mastered Go - and the professional player it beat.

Supplementary information

Supplementary Information

This zipped file contains game records for the 5 formal match games played between AlphaGo and Fan Hui. (ZIP 3 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Fig. 4

PowerPoint slide for Fig. 5

PowerPoint slide for Fig. 6

Rights and permissions

Reprints and permissions

About this article

Cite this article

Silver, D., Huang, A., Maddison, C. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961

Download citation

Received: 11 November 2015
Accepted: 05 January 2016
Published: 27 January 2016
Issue Date: 28 January 2016
DOI: https://doi.org/10.1038/nature16961

This article is cited by

Emerging opportunities and challenges for the future of reservoir computing
- Min Yan
- Can Huang
- Jie Sun
Nature Communications (2024)
Efficient evolution of human antibodies from general protein language models
- Brian L. Hie
- Varun R. Shanker
- Peter S. Kim
Nature Biotechnology (2024)
Research on time series prediction of multi-process based on deep learning
- Huali Zheng
- Yu Cao
- Chunming Ye
Scientific Reports (2024)
Large language models help computer programs to evolve
- Jean-Baptiste Mouret
Nature (2024)
Efficient retrosynthetic planning with MCTS exploration enhanced A* search
- Dengwei Zhao
- Shikui Tu
- Lei Xu
Communications Chemistry (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.