makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch
Back to blog
Published
January 23, 2024
TLDR; This blog walks through implementing a sparse mixture of experts language model from scratch. This is inspired by and largely based on Andrej Karpathy's project 'makemore' and borrows a number of re-usable components from that implementation. Just like makemore, makeMoE is also an autoregressive character-level language model but uses the aforementioned sparse mixture of experts architecture. The rest of the blog focuses on the key elements of this a...
Read more at huggingface.co