State-Space Models Demonstrate In-Context Learning Capability Through Gradient Descent, Matching Transformer Performance

State-space models can learn in-context by gradient descent

View PDF HTML (experimental) Abstract:Deep state-space models (Deep SSMs) have shown capabilities for in-context learning on autoregressive tasks, similar to transformers. However, the architectural requirements and mechanisms enabling this in recurrent networks remain unclear. This study demonstrates that state-space model architectures can perform gradient-based learning and use it for in-context learning. We prove that a single structured state-space model layer, augmented with local self-att...