Manifest AI - Linear Transformers Are Faster After All
\[
\newcommand{\R}{\mathbb{R}}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\sft}{\text{softmax}}
\newcommand{\List}{\text{List}}
\newcommand{\Seq}{\text{Seq}}
\newcommand{\SeqT}{\text{SeqT}}
\newcommand{\CSeqT}{\text{CSeqT}}
\newcommand{\Dist}{\text{Dist}}
\newcommand{\SM}{\text{SM}}
\newcommand{\Fn}{\text{Fn}}
\newcommand{\Tok}{\text{Tok}}
\newcommand{\Aij}{ A_{[i,j]}}
\]
It is well-known that removing the exponential from the attention layer of a transformer allows for ...
Read more at manifestai.com