Simulation of QUARTO! board game and minimax algorithm
Final project of the “Simulation techniques” subject in the Master in Methodology for behavioral sciences. For this assignment we were given a short description of the QUARTO! board game rules (slightly modified) and a clear objective: to simulate it in R and to design a game strategy. The performance of the game strategy had to be better than a strategy based on random selection of moves.
Game description
The rules of this strategy game are easy: “The wooden pieces have four characteristics: light/dark, tall/short, square/round, or hollow/filled. Each turn, an opponent chooses which piece you play; if you form a line of four pieces with one characteristic in common, you win. If you can’t, you choose a piece for your opponent to play. The game progresses until a winner is decided or all pieces have been played.” (Source:www.hammacher.com)
However, in our case the rules were slightly modified. The player doesn’t choose the opponent’s pieces, she chooses her own pieces. That change makes the game more similar to tic-tac-toe or connect 4.
QUARTO! in code
How can the game be represented in R?
- Pieces: There are $2^4=16$ different pieces that result from the combination of different properties (2 different properties for each of the 4 attributes). Because the specific property or attribute the piece has doesn’t make any difference to the game, I decided to represent the set of all pieces as a 16 x 4 numeric matrix.
# expand.grid does the same but it's a bit slower
create.pieces <- function(){
return(cbind(rep(1:2,each=8,times=1),
rep(3:4,each=4,times=2),
rep(5:6,each=2,times=4),
rep(7:8,each=1,times=8)))
}
create.pieces()
[,1] [,2] [,3] [,4]
[1,] 1 3 5 7
[2,] 1 3 5 8
[3,] 1 3 6 7
[4,] 1 3 6 8
[5,] 1 4 5 7
[6,] 1 4 5 8
[7,] 1 4 6 7
[8,] 1 4 6 8
[9,] 2 3 5 7
[10,] 2 3 5 8
[11,] 2 3 6 7
[12,] 2 3 6 8
[13,] 2 4 5 7
[14,] 2 4 5 8
[15,] 2 4 6 7
[16,] 2 4 6 8
- Board: The board must be able to store information about position, but also about the attributes of every piece, because they are all different. For that reason I created a 3D board, an array of dimensions 4 x 4 x 4:
board <- array(0, dim = c(4,4,4))
, , 1
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 0 0 0 0
[3,] 0 0 0 0
[4,] 0 0 0 0
, , 2
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 0 0 0 0
[3,] 0 0 0 0
[4,] 0 0 0 0
, , 3
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 0 0 0 0
[3,] 0 0 0 0
[4,] 0 0 0 0
, , 4
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 0 0 0 0
[3,] 0 0 0 0
[4,] 0 0 0 0
Moving a piece to a certain position in the board is now as easy as doing this:
fourth_piece <- create.pieces()[4,] # [1 3 6 8]
board[2,2,] <- fourth_piece
, , 1
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 0 1 0 0
[3,] 0 0 0 0
[4,] 0 0 0 0
, , 2
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 0 3 0 0
[3,] 0 0 0 0
[4,] 0 0 0 0
, , 3
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 0 6 0 0
[3,] 0 0 0 0
[4,] 0 0 0 0
, , 4
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 0 8 0 0
[3,] 0 0 0 0
[4,] 0 0 0 0
To avoid mistakes and using the same piece or position twice, we need to keep track of the available pieces and positions. We need an object for both. We saw the pieces can be stored in a matrix, and that’s also possible with the available moves.
pieces <- create.pieces()
( positions <- cbind(1:4,rep(1:4,each=4)) )
[,1] [,2]
[1,] 1 1
[2,] 2 1
[3,] 3 1
[4,] 4 1
[5,] 2 2
[6,] 3 2
[7,] 4 2
[8,] 1 3
[9,] 2 3
[10,] 3 3
[11,] 4 3
[12,] 1 4
[13,] 2 4
[14,] 3 4
[15,] 4 4
A complete move by one player could be like this:
board <- array(0,c(4,4,4))
positions <- cbind(1:4,rep(1:4,each=4))
pieces <- create.pieces()
# choose piece and position
pos <- positions[5,] # 1 x 2 vector
piece <- pieces[4,] # 1 x 4 vector
# move piece to position
board[pos[1],pos[2],] <- piece
# update remaining pieces and positions
positions <- positions[-5,] # 15 x 2 matrix
pieces <- pieces[-4,] # 15 x 2 matrix
Now we are capable a simulating a complete game (link to code at the end), but the way players choose their pieces and moves is totally random at this moment.
Game strategy: Minimax and alpha-beta
In order to create an artificial agent that plays rationally I decided to implement the minimax algorithm with alpha-beta pruning. Technically they are different algorithms, but I consider the latter as an improvement or extension of the former.
And how do they work? Good explanations can be found in Russell & Norvig (2020) and Wikipedia:
- Minimax: “Given a game tree, the optimal strategy can be determined from the minimax value of each state in the tree (…). The minimax value is the utility (for player A, for example) of being in the corresponding state, assuming that both players play optimally from there to the end of the game.” The utility given by the payoff function “indicates how good it would be for a player to reach that position”.Obviously, “The minimax value of a terminal state is just its utility” (score associated with win, loss, draw or other complex states). Minimax works by choosing the next move based on these utilities.
So, for instance, “If player A can win in one move, his best move is that winning move. If player B knows that one move will lead to the situation where player A can win in one move, while another move will lead to the situation where player A can, at best, draw, then player B’s best move is the one leading to a draw. Late in the game, it’s easy to see what the “best” move is. The Minimax algorithm helps find the best move, by working backwards from the end of the game. At each step it assumes that player A is trying to maximize the chances of A winning, while on the next turn player B is trying to minimize the chances of A winning (i.e., to maximize B’s own chances of winning).”
- Alpha-beta pruning: Minimax is a recursive algorithm that works by searching or exploring the states of the game tree. That becomes a problem even with moderately complex games because “the number of game states is exponential in the depth of the tree”. That means that “raw” or naïve minimax is often impractical. However, we can use the technique of alpha-beta pruning to prune (leave unexplored) “large parts of the tree that make no difference to the outcome”. It’s basically the same as minimax but “It stops evaluating a move when at least one possibility has been found that proves the move to be worse than a previously examined move. Such moves need not be evaluated further.”
I’ve fundamentally based the implementation of the algorithm on the pseudocode provided by Russell & Norvig (2020). However, it’s not exactly the same. I reproduce here the pseudocode that best represents the program I designed, and again, the link to the actual code can be found at the end of the post.
FUNCTION AB_MINIMAX(board,ply,player,positions,pieces,depth=0,alpha=-infinity,beta=infinity)
best = list(value,pos,piece)
terminal_state = is_terminal(board,ply+depth)
IF terminal_state OR depth == 4 THEN
best.value = payoff(terminal_state,player,depth)
RETURN best
ELSE
IF player == 1 THEN
v = -infinity
FOR each position in positions
FOR each piece in pieces
updated = result(board,positions,position,pieces,piece)
output = AB_MINIMAX(updated.board,ply,-player,
updated.positions,updated.pieces,
depth+1,alpha,beta)
IF output.value > v THEN
v = output.value
best.value = output.value
best.pos = position
best.piece = piece
alpha = max(alpha,v)
ENDIF
IF v >= beta THEN
RETURN best
ENDIF
ENDFOR
ENDFOR
ELSE
v = infinity
FOR each position in positions
FOR each piece in pieces
updated = result(board,positions,position,pieces,piece)
output = AB_MINIMAX(updated.board,ply,-player,
updated.positions,updated.pieces,
depth+1,alpha,beta)
IF output.value < v THEN
v = output.value
best.value = output.value
best.pos = position
best.piece = piece
alpha = max(alpha,v)
ENDIF
IF v <= alpha THEN
RETURN best
ENDIF
ENDFOR
ENDFOR
ENDIF
ENDIF
ENDFUNCTION
The issue of efficiency
I originally tried to make everything in R, but I quickly noticed that the program was terribly slow. To reduce the running time I decided to take advantage of the package rcpp
and I re-wrote almost everything in C++ code. The improvement was remarkable. Here you can see the time needed to make a move after 12 “plies” or turns:
microbenchmark::microbenchmark(
'R'={AB_MINIMAX_R(demo$board,ply=13,player=1,demo$positions_left,demo$pieces_left)},
'C++'={AB_MINIMAX(demo$board,ply=13,player=1,demo$positions_left,demo$pieces_left)}
)
Unit: microseconds
expr min lq mean median uq max neval cld
R 13242.1 14604.3 20137.87 16809.8 22100.9 57384.4 100 b
C++ 259.1 296.1 375.22 322.5 407.2 1144.8 100 a
Link to code.
Link to the report.
Comments
- Human vs. “machine”: Eventhough I didn’t mention it, the game has been designed so that it can be played by a real human. I’ve made a second version of the main program with an user interface that has been modified to be more “attractive” and less confusing (converting the 3D board to a bidimensional matrix, for example).
References
Russell, S. & Norvig, P. (2020). Adversarial Search and Games. In S. Russell y P. Norvig, Artificial Intelligence: a modern approach (4a ed) pp.(146-180). Pearson.