A ChatGPT clone, in 3000 bytes of C, backed by GPT-2
This program is a dependency-free implementation of GPT-2. It loads
the weight matrix and BPE file out of the original TensorFlow files,
tokenizes the input with a simple byte-pair encoder,
implements a basic linear algebra package with matrix math operations,
defines the transformer architecture, performs transformer inference,
and un-tokenizes the output with the BPE decoder.
All in ~3000 bytes of C.
It's optimized efficiently enough so that GPT-2 Small takes a few
seconds per reply on any mod...
Read more at nicholas.carlini.com