"Avinash Sooriyarachchi Implements Sparse Mixture of Experts Language Model 'makeMoE' from Scratch; Offers Insights on Stable Training, new Approaches"

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

Back to blog Published January 23, 2024 TLDR; This blog walks through implementing a sparse mixture of experts language model from scratch. This is inspired by and largely based on Andrej Karpathy's project 'makemore' and borrows a number of re-usable components from that implementation. Just like makemore, makeMoE is also an autoregressive character-level language model but uses the aforementioned sparse mixture of experts architecture. The rest of the blog focuses on the key elements of this a...