AI Researchers Develop Achievement-Based Evaluation Method for LLMs Playing Text Adventures

Evaluating LLMs Playing Text Adventures

When we first set up the llm such that it could play text adventures, we noted that none of the models we tried to use with it were any good at it. We dreamed of a way to compare them, but all I could think of was setting a goal far into the game and seeing how long it takes them to get there. I just realised there’s a better way to do it. Evaluation against achievments What we’ll do is set a low-ish turn limit and see how much they manage to accomplish in that time.11 Another alternative for mo...