"Chess-GPT's World Model Manipulated to Gauge Its Understanding of Chess: Details Inside"

Manipulating Chess-GPT’s World Model

In my previous post I introduced Chess-GPT, a language model I trained to predict the next character in a game of chess given a PGN string (1.e4 e5 2.Nf3 …). Through the process of training to output the next character, it learns to compute the state of the chess board and to estimate the skill level of the players in the game given an arbitrary PGN string as input. I demonstrated this using linear probes, which are classifiers that take the model’s activations as input and predict the board sta...