LLM Training Begins: Cross Entropy Loss Function Explained for Gradient Descent in AI Models

Writing an LLM from scratch, part 20 -- starting training, and cross entropy loss

Archives Categories Blogroll Chapter 5 of Sebastian Raschka's book "Build a Large Language Model (from Scratch)" explains how to train the LLM. There are a number of things in there that required a bit of thought, so I'll post about each of them in turn. The chapter starts off easily, with a few bits of code to generate some sample text. Because we have a call to torch.manual_seed at the start to make the random number generator deterministic, you can run the code and get exactly the same resu...