Applying Expectimax to Yahtzee

2020-01-17

Introduction

One of the games that my family likes to play when we're waiting around for something to come out of the oven is Yahtzee. Yahtzee is a dice game where players take turns trying to complete a set of roll combinations with five dice. Each turn, you get three rolls: an initial roll and two rerolls, from which you can hold back any dice you wish to save. The highest scoring roll is a Yahtzee, which is a five of a kind. Other scoring combinations include the Small and Large Straight, of four and five in a row respectively, and the Full House, which is two of one and three of another, as in poker.

{{< figure src="/img/yahtzee/score_sheet.png" caption="Figure 1: The full score sheet for a game of Yahtzee" width="450 px" >}}

Despite being a simple luck-based game, there are some basic strategic elements to Yahtzee, which beginners quickly pick up. For example, the top section of the score sheet introduces a nonlinearity into the scoring, which is that if you average rolls of three-of-a-kind or more for each of the six numbers, you receive a bonus of 35 points. This makes completing the top section in good form a priority during most games, since most of the lower section can be completed more or less opportunistically.

If, on your turn, you fail to complete any of the combinations, you can elect to score the sum of your dice in your Chance row, or, if that is used up, score a zero for any of the rows. This tactic is often used to get an extra turn to complete the top section for the bonus points. It's sometimes wise to score your Yahtzee row for zero points in the hopes of rolling three sixes the next turn.

Yahtzee seems like a game it should be possible to write a program to play capably. All probabilities involved are easy to calculate, and the state space is not that large, assuming we can reduce the \(6^5 = 7776\) distinct rolls by exploiting symmetry. A decent Yahtzee playing agent should exhibit at least some of the strategies described above. More specifically, we want to build a bot that can:

Expectimax

Our Yahtzee playing bot is going to use a classic algorithm: Expectimax. Expectimax is a tree search algorithm, which means that it treats the game as a tree of possible playouts. Each state of the game is a node of the tree, and the job of the algorithm is to determine the expected value to the player of every game state that it explores. By exploring the tree to the fullest extent of our ability, our bot will take into account as many outcomes as possible, and compute an accurate estimate of its expected score. Then, it will simply make the decision that gives the highest expected score.

Games like chess and checkers are deterministic, in the sense that there is no randomness involved. In these games, the value of a given state of the game is fully determined, assuming that both players play "perfectly". For a random game like Yahtzee, we have to account for the fact that a given decision can have multiple results, depending on how the dice fall. Therefore, expectimax treats game states as falling into two different classes:

The expectimax algorithm is a recursive search over the game tree, using these two node types as a template. It is common to apply a maximum depth to the game tree, past which the algorithm terminates its search and applies a heuristic evaluation to the game state. The pseudocode for expectimax, then, looks something like this:

def expectimax(state, depth):
    if depth == MAX_DEPTH:
        return heuristic_evaluation(state)

    child_scores = [expectimax(next_state, depth + 1) for next_state in state.children]

    if state.is_deterministic:
        return max(child_scores)
    else:
        return mean(child_scores)

In our Yahtzee implementation, all chance nodes represent rolls of the dice. A deterministic node occurs immediately after each roll, and requires the agent to make a choice about whether to hold the dice or reroll. A chance node occurs immediately after each decision, and is evaluated randomly. For every roll, there is a deterministic node for all possible outcomes, represented with boxes. And for every decision node, there is a chance node for every possible decision at that point, represented with ovals.

{{< figure src="/img/yahtzee/tree.png" >}}

With the structure of the game broken down into states and the two different kind of transitions between them, deterministic and random, we are ready to try writing a first version of our future Yahtzee world champion bot. We'll be using Rust.

A First Attempt

As a first attempt to apply the expectimax algorithm to Yahtzee, let's define a pared-down version of the game which only involves the upper section. That is, the score sheet consists of six rows, each of which can be optionally filled in with a score. When all six of the rows are filled, the game is done:

struct Scoresheet {
    ones: Option<u8>,
    twos: Option<u8>,
    threes: Option<u8>,
    fours: Option<u8>,
    fives: Option<u8>,
    sixes: Option<u8>,
}

The entire game state is comprised of a score sheet, plus the current turn. A turn is simply a Roll, marked as the first, second, or third chance of that turn. We'll model Chance as an enum, and Roll as an array of five dice. Since dice go up to six, we can use the u8 primitive type:

enum Chance {
    First,
    Second,
    Third,
}

struct Roll([u8; 5], Chance);

We're almost able to define our Game struct, which will encapsulate the entire state of a game of Yahtzee. The last thing we need is a way to represent which dice are held out from a roll. Our implementation will distinguish between the state of a game immediately before a roll, when the set of dice to hold has been decided, and the state of a game immediately after a roll, when no decision to reroll or score has yet been made. The reasoning is that, per the above high-level description of expectimax, "chance" nodes are treated differently from deterministic nodes. The state immediately before a roll is a chance node, while the state immediately after is a deterministic decision point.

The last two structs we need, then, are as follows:

struct Hold([bool; 5]);

struct Game {
    scores: Scoresheet,
    roll: Roll,
    hold: Option<Hold>,
}

With these domain models in place, we can implement a rudimentary version of the expectimax algorithm. In the interest of brevity, I have omitted some glue code and combinatorial functions which yield the set of all possible decisions or rolls. The core of our implementation is very similar to the above pseudocode.

Expecting the Maximum

Our expectimax function accepts a game state, and an integer telling it how far down the game tree this invocation is. If the game is done, or if we have reached the maximum depth, then we refrain from recursing further, and instead hand the job off to our heuristic evaluation function.

fn expectimax(game: &Game, depth: u32) -> f64 {
    if depth == MAX_DEPTH || game.completed() {
        return heuristic_evaluation(game);
    }

Our heuristic evaluation assumes a likely score of 2.5 * n for each empty row, plus the points already scored on filled rows. Therefore, the heuristic evaluation converges to the actual score as the game progresses. There is substantial room for improvement here, but it is sufficient to motivate the algorithm to try for 3 or more dice on each row, which is the baseline for a successful top section in a full game.

Here is the main body of the expectimax function. Whether a game state is deterministic or random is determined by whether their is an active hold of the dice--if there is, we must be about to roll the rest of the dice.

    match game.hold {
        // We just rolled, and have to decide whether to reroll or to score our dice
        None => {
            let all_options = game.options();
            let (_, _, expectation) = best_option(
                game, depth + 1, all_options, rng, bench, max_depth);
            expectation
        },
        // We made a decision to reroll, and are looking for the expected value,
        // averaged over all possible roll outcomes.
        Some(hold) => {
            let rerolls = game.roll.rerolls(&hold);
            let stats: StatsCounter = rerolls.into_iter()
                .map(|roll| {
                    expectimax(&game.with_roll(roll), depth + 1)
                })
                .collect();
            stats.avg()
        },
    }
}

Note that we are making use of a cool feature of Rust: iterator adapters. Our StatsCounter struct can be constructed from an iterator of floats. It automatically computes both the average and variance of the stream, by accumulating a count, sum, and sum of squares.

Weighing Options

Along with the expectimax function, a core piece of the algorithm is the function which determines the best option from among several. This has been factored out of expectimax because it's also used to actually drive the game play. As we would expect, it calls back into expectimax to determine the best possible play. To serve both callers, it returns a triple of (decision, game, score). Of these, expectimax only cares about the score.

/// Return the best option from a list of decisions, along with the game
/// resulting from that decision, and its expectimax score.
fn best_option(
    game: &Game,
    depth: u32,
    options: Vec<Decision>,
) -> (Decision, Game, f64) {
    assert!(!options.is_empty(), "Empty list of options");
    return options.into_iter()
        .map(|decision| {
            let resulting_game = game.resulting(decision, rng);
            let expectation = expectimax(&resulting_game, depth);
            (decision, resulting_game, expectation)
        })
        .max_by(|(_, _, e1), (_, _, e2)| {
            e1.partial_cmp(&e2).expect("Tried to compare a NaN")
        })
        .unwrap();
}

Playing the Game

Once we have the expectimax and best_option functions defined, actually running the game is just a few lines of code. While the game is not yet complete, we execute the best available decision. The outer loop needs to print out a description of the decision taken, and update the actual game state to the result of that decision:

pub fn play_game(max_depth: u32) -> Game {
    let mut rng: SmallRng = SmallRng::from_entropy();

    let mut game = Game::new(&mut rng);
    while !game.done() {
        // Make the best available decision
        let options = game.options();
        let (best, resulting_game, _) = best_option(&game, 0, options);

        // Print out the decision
        println!("{}", best.describe(&game));

        game = resulting_game;

        if game.hold.is_some() {
            // Make a roll
            game = game.roll(&mut rng);
        }
    }
    game
}

A Promising Start

A typical game with this first implementation looks like this:

⚅⚅⚄⚂⚃
Rerolling:	⚅⚅⚄⚂⚃
Rerolling:	⚅⚅⚄⚅⚂
Scoring sixes	⚅⚅⚃⚅⚄
| 1s | 2s | 3s | 4s | 5s | 6s |
|  _ |  _ |  _ |  _ |  _ | 18 |
Rerolling:	⚁⚄⚃⚅⚂
Rerolling:	⚅⚄⚅⚀⚁
Scoring ones	⚁⚂⚄⚀⚃
| 1s | 2s | 3s | 4s | 5s | 6s |
|  1 |  _ |  _ |  _ |  _ | 18 |
Rerolling:	⚄⚃⚁⚃⚅
Rerolling:	⚃⚃⚀⚃⚁
Scoring fours	⚃⚃⚄⚃⚀
| 1s | 2s | 3s | 4s | 5s | 6s |
|  1 |  _ |  _ | 12 |  _ | 18 |
Rerolling:	⚂⚂⚂⚃⚃
Rerolling:	⚂⚂⚂⚂⚅
Scoring threes	⚂⚂⚂⚂⚅
| 1s | 2s | 3s | 4s | 5s | 6s |
|  1 |  _ | 12 | 12 |  _ | 18 |
Rerolling:	⚁⚅⚅⚁⚁
Rerolling:	⚁⚄⚀⚁⚁
Scoring twos	⚁⚄⚅⚁⚁
| 1s | 2s | 3s | 4s | 5s | 6s |
|  1 |  6 | 12 | 12 |  _ | 18 |
Rerolling:	⚄⚂⚀⚃⚄
Rerolling:	⚄⚅⚂⚅⚄
Scoring fives	⚄⚅⚅⚀⚄
| 1s | 2s | 3s | 4s | 5s | 6s |
|  1 |  6 | 12 | 12 | 10 | 18 |

We certainly haven't mastered the game. But our bot already demonstrates a very basic grasp of tactics. In the very first turn of this example game, it holds the pair of sixes, and attempts to roll some more. In the second turn, it first holds the five, and when it fails to roll another, switches to holding the 1 in order to minimize the damage.

While this is somewhat promising, we've actually already run up against the limits of this data model. At a search depth of 3, our expectimax implementation is examining upwards of 250 million leaf nodes. Granted, this takes just a few seconds per turn, but the growth is exponential in the tree depth. In order to do better, we need to do two things:

Looked at from a certain point of view, memoization is just a different way of taking advantage of symmetry. Paths through the tree which arrive at the same state will be deduplicated with a good memoization strategy. But comparing and deduplicating two game states has a nonzero runtime cost. It would be even better if we could keep the two symmetric paths from both appearing at all, by being more clever about the way we generate game states.

In the next post, we'll improve our domain modeling to take better advantage of the information that the game of Yahtzee actually uses, and extend the depth of our tree search much farther.