Allen Institute for AI releases Bolmo 7B and 1B, first fully open byte-level language models that process raw UTF-8 bytes without tokenizers, enabling better multilingual and noisy text handling by retrofitting existing Olmo 3 models through two-stage training.

Bolmo’s architecture unlocks efficient byte‑level LM training without sacrificing quality

Enterprises that want tokenizer-free multilingual models are increasingly turning to byte-level language models to reduce brittleness in noisy or low-resource text. To tap into that niche — and make it practical at scale — the Allen Institute of AI (Ai2) introduced Bolmo, a new family of models that leverage its Olmo 3 models by “bytefiying” them and reusing their backbone and capabilities. The company launched two versions, Bolmo 7B and Bolmo 1B, which are “the first fully open byte-level langu...