Evaluating LLMs Playing Text Adventures
When we first set up the llm such that it could play text adventures, we noted
that none of the models we tried to use with it were any good at it. We dreamed
of a way to compare them, but all I could think of was setting a goal far into
the game and seeing how long it takes them to get there. I just realised there’s
a better way to do it.
Evaluation against achievments
What we’ll do is set a low-ish turn limit and see how much they manage to
accomplish in that time.11 Another alternative for mo...
Read more at entropicthoughts.com