Long Convolutions for GPT-like Models: Polynomials, Fast Fourier Transforms and Causality
We’ve been writing a series of papers (1, 2, 3) that have at their core so-called long convolutions, with an aim towards enabling longer-context models. These are different from the 3x3 convolutions people grew up with in vision because well, they are longer–in some cases, with filters as long as the whole sequence. A frequent line of questions we get is about what these long convolutions are and how we compute them efficiently, so we put together a short tutorial. At the end of this, you’ll rem...
Read more at hazyresearch.stanford.edu