News Score: Score the News, Sort the News, Rewrite the Headlines

HMT: Hierarchical Memory Transformer for Long Context Language Processing

View PDF HTML (experimental) Abstract:Transformer-based large language models (LLM) have been widely used in language processing applications. However, most of them restrict the context window that permits the model to attend to every token in the inputs. Previous works in recurrent models can memorize past tokens to enable unlimited context and maintain effectiveness. However, they have "flat" memory architectures, which have limitations in selecting and filtering information. Since humans are ...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines