News Score: Score the News, Sort the News, Rewrite the Headlines

State-space models can learn in-context by gradient descent

View PDF HTML (experimental) Abstract:Deep state-space models (Deep SSMs) have shown capabilities for in-context learning on autoregressive tasks, similar to transformers. However, the architectural requirements and mechanisms enabling this in recurrent networks remain unclear. This study demonstrates that state-space model architectures can perform gradient-based learning and use it for in-context learning. We prove that a single structured state-space model layer, augmented with local self-att...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines