News Score: Score the News, Sort the News, Rewrite the Headlines

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

View PDF HTML (experimental) Abstract:The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy. We introduce Megalodon, a neural architecture for efficient sequence modeling with unlimited context length. Megalodon inherits the architecture of Mega (exp...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines