The main problem of methods such as minimax, alpha-beta search and their variants is to do with the huge state spaces of even moderately complex games and with the extremely large numbers of trajectories that the games can take in the state spaces. In general it is simply impossible to take all the alternatives into account.

n-step Look-ahead

One relatively straight-forward way to perform search in games with excessively large state spaces is to limit the number of future turns that are taken into account so that the search tree does not grow too large. The problem is, of course, that the leaves of the tree will then represent states, in which the game is not yet over, so we will not know their real value (win/loss, the player’s score etc.). We will therefore need to combine n-step look-ahead with some kind of a value function, which will enable us to evaluate the games still in progress, i.e. to estimate the chances of winning from each of the states.

n-step look-ahead.

There are several ways to get such value function:

  • A heuristic function: Designed by hand, based on already known intuitions about the game. In chess e.g. the ratio of the player’s and the opponent’s remaining figures.
  • Simulation: The value is determined using the model, but instead of complete search a less precise, but more scalable method is used.
  • Machine learning: The value function is built on experience from previous games using some reinforcement learning method. Even memoization would be a trivial example of such an approach, provided that the cache is not cleared at the end of each game. More sophisticated methods of this kind include e.g. DeepMind’s AlphaGo and AlphaZero [alphago, alphazero].

Monte Carlo Search

One of the approaches based on n-step look-ahead is Monte Carlo search (MCS). This uses simulation to evaluate non-terminal states: it runs so-called rollouts from each of them. In a rollout instead of doing search we instead keep picking random legal actions for both players until the game is over and we know the actual value of the terminal state [essentials]. Even though the actions are selected randomly, if we do a larger number of such rollouts, it will nevertheless enable us to make at least a rough estimate of what results we can expect when playing from a certain state.

Monte Carlo search (according to [essentials], with modifications).

Literature

  1. [alphago] Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M. and Dieleman, S., 2016. Mastering the game of Go with deep neural networks and tree search. nature, 529(7587), pp.484-489.
  2. [alphazero] Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T. and Lillicrap, T., 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815.
  3. [essentials] Leyton-Brown, K. and Shoham, Y., 2008. Essentials of game theory: A concise multidisciplinary introduction. Synthesis lectures on artificial intelligence and machine learning, 2(1), pp.1-88.