The basic method that extends classical search methods to zero-sum games is called minimax – a combination of the words minimization and maximization [essentials, aima]. As we have already mentioned in the entry on adversarial search, in zero-sum games the goal is to maximize the player’s own score and minimize the opponent’s score. The minimax algorithm is based on the assumption that the player and the opponent are both going to be perfectly rational. Our player should at each step select the action that maximizes their score and the opponent should do the opposite: select the action that is going to minimize our player’s score.
Thus, in minimax the search tree has two types of nodes – maximization (where the player selects the best action for themselves) and minimization, where the agent selects an action for the opponent – the action that would be most harmful to our player’s score). Thus the name of the algorithm.
As an illustration we will be using the game known as tic-tac-toe. A 3×3 tic-tac-toe game board is shown in the figure.
A tic-tac-toe board.
The game is played by two: the first player controls the placement of the circles and the other of the crosses. The game is won by the player that is the first to place three of its symbols next to each other (in a row, column or diagonally). The players can also tie – if neither of them manages this before the game board fills up. If both players play optimally, tic-tac-toe actually cannot be won: it is only possible for the players to tie.
The tic-tac-toe state space (even for a tiny game board) is too large to be easily visualized. In the first step, the player chooses from 9 actions, in the second from 8 actions, etc. – the number of all possible games is therefore 9! = 362880. If we only choose to visualize a part of a game already in progress, the search tree could look as follows.
Minimax: an example in tic-tac-toe.
As we can see, the tree contains red upwards-pointing triangles, which correspond to our player’s decisions (symbol ◯). These are the max nodes: we select actions that maximize our player’s score. The blue downwards-pointing triangles correspond to the opponent’s decisions (symbol ✕). These are min nodes, because the other player is playing against our player and they are trying to minimize our score.
At the top level of our tree, player ◯ is choosing from three actions. At the subsequent level we can see each of the three possible states to which the game can transition. In these it is player ✕’s turn and we assume that they are going to take the action that will cause us the most harm. When selecting an action for our player, we will therefore try to transition into a state from which the opponent can no longer win: no matter what action they choose.
Literature
- [essentials] Leyton-Brown, K. and Shoham, Y., 2008. Essentials of game theory: A concise multidisciplinary introduction. Synthesis lectures on artificial intelligence and machine learning, 2(1), pp.1-88.
- [aima] Russell, S. and Norvig, P., 2002. Artificial intelligence: a modern approach.