GitHub - AviSoori1x/makeMoE: From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)
makeMoE
Developed using Databricks with ❤️
Sparse mixture of experts language model from scratch inspired by (and largely based on) Andrej Karpathy's makemore (https://github.com/karpathy/makemore) :)
HuggingFace Community Blog that walks through this: https://huggingface.co/blog/AviSoori1x/makemoe-from-scratch
Part #2 detailing expert capacity: https://huggingface.co/blog/AviSoori1x/makemoe2
This is an implementation of a sparse mixture of experts language model from scratch. This is inspired b...
Read more at github.com