8.5
"Mamba-based Language Models Match or Outperform Transformers in Large-scale Training Studies: Hybrid Mamba-2 Model Tops Performance on 12 Standard Tasks and is 8x Faster, Unveils NVIDIA's Megatron-LM Project"
arxiv.org
#
©
News Score
score the news, sort the news, rewrite the headlines
Leaderboard
Submit
About