"New Methods Developed for Detecting Under-Trained Tokens in Large Language Models, Improving Efficiency and Safety"

Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models

View PDF Abstract:The disconnect between tokenizer creation and model training in language models has been known to allow for certain inputs, such as the infamous SolidGoldMagikarp token, to induce unwanted behaviour. Although such `glitch tokens' that are present in the tokenizer vocabulary, but are nearly or fully absent in training, have been observed across a variety of different models, a consistent way of identifying them has been missing. We present a comprehensive analysis of Large Langu...