"State-space Models Fail to Outperform Transformers in Large Language Models, Struggle with State Tracking and Sequential Computation: Study"

The Illusion of State in State-Space Models

View PDF HTML (experimental) Abstract:State-space models (SSMs) have emerged as a potential alternative architecture for building large language models (LLMs) compared to the previously ubiquitous transformer architecture. One theoretical weakness of transformers is that they cannot express certain kinds of sequential computation and state tracking (Merrill & Sabharwal, 2023), which SSMs are explicitly designed to address via their close architectural similarity to recurrent neural networks (RNN...