OpenAI's GPT-3.5-Turbo-Instruct Plays Chess at Advanced Amateur Level; Researcher Debunks Cheating Theories, Demonstrates Other LLMs Can Play Well with Precise Prompting

OK, I can partly explain the LLM chess weirdness now

We recently talked about a mystery: All large language models (LLMs) are terrible at chess. All, that is, except for gpt-3.5-turbo-instruct, which for some reason can play at an advanced amateur level. This is despite the fact that this model is more than a year old and much smaller than recent models. What’s going on? I suggested four possible explanations: Theory 1: Large enough base models are good at chess, but this doesn’t persist through instruction tuning to chat models. Theory 2: For som...