News Score: Score the News, Sort the News, Rewrite the Headlines

The Illusion of State in State-Space Models

View PDF HTML (experimental) Abstract:State-space models (SSMs) have emerged as a potential alternative architecture for building large language models (LLMs) compared to the previously ubiquitous transformer architecture. One theoretical weakness of transformers is that they cannot express certain kinds of sequential computation and state tracking (Merrill & Sabharwal, 2023), which SSMs are explicitly designed to address via their close architectural similarity to recurrent neural networks (RNN...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines