Titans: New AI Architecture Combines Long-Term Memory and Attention, Outperforming Transformers in Multiple Tasks

Titans: Learning to Memorize at Test Time

View PDF HTML (experimental) Abstract:Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. W...