IBM's 'Bamba': New Hybrid AI Model Combines Transformer and SSM, Runs 2x Faster with Comparable Accuracy

IBM crossed a transformer with an SSM and got ‘Bamba’

The transformer architecture behind today’s large language models has shown an uncanny ability to generate human-like text. Part of its effectiveness comes from its self-attention mechanism, which allows the model to weigh all the words in an input sequence when generating a response.The problem comes as conversations get longer. Because the model holds the running sequence in memory as it responds, the cumulative cost of generation grows quadratically. If the size of the context window doubles,...