Developer trains 163M-parameter GPT-2 base model from scratch in 48 hours on RTX 3090 using Hugging Face datasets, achieving near-original performance on consumer hardware.

Writing an LLM from scratch, part 28 -- training a base model from scratch on an RTX 3090

Archives Categories Blogroll Having worked through the main body of Sebastian Raschka's book "Build a Large Language Model (from Scratch)", I wanted to try an experiment: is it possible to train a base model of my own, on my own hardware? The book shows you how to train your LLM, does a basic training run on a small dataset, and then we switch to downloading the "pre-cooked" weights from OpenAI. That makes sense given that not every reader will have access to enough hardware to really train fro...