OK, I can partly explain the LLM chess weirdness now
We recently talked about a mystery: All large language models (LLMs) are terrible at chess. All, that is, except for gpt-3.5-turbo-instruct, which for some reason can play at an advanced amateur level. This is despite the fact that this model is more than a year old and much smaller than recent models. What’s going on?
I suggested four possible explanations:
Theory 1: Large enough base models are good at chess, but this doesn’t persist through instruction tuning to chat models.
Theory 2: For som...
Read more at dynomight.net