"Hierarchical Memory Transformer: Novel Framework Mimics Human Memory to Improve Long-Context Processing in Language Models"

HMT: Hierarchical Memory Transformer for Long Context Language Processing

View PDF HTML (experimental) Abstract:Transformer-based large language models (LLM) have been widely used in language processing applications. However, most of them restrict the context window that permits the model to attend to every token in the inputs. Previous works in recurrent models can memorize past tokens to enable unlimited context and maintain effectiveness. However, they have "flat" memory architectures, which have limitations in selecting and filtering information. Since humans are ...