ATLAS: New AI Memory Module Outperforms Transformers, Achieves 80% Accuracy on 10M-Context Tasks

ATLAS: Learning to Optimally Memorize the Context at Test Time

View PDF HTML (experimental) Abstract:Transformers have been established as the most popular backbones in sequence modeling, mainly due to their effectiveness in in-context retrieval tasks and the ability to learn at scale. Their quadratic memory and time complexity, however, bound their applicability in longer sequences and so has motivated researchers to explore effective alternative architectures such as modern recurrent neural networks (a.k.a long-term recurrent memory module). Despite their...