What GPT-oss Leaks About OpenAI's Training Data
19th of September 2025
OpenAI recently released their open-weights model. Here we'll discuss how that inevitably leaks some information about their model training stack, and, on the way, show that GPT-5 was trained on phrases from adult websites.
What data does OpenAI train their models on? That is a well-protected trade secret of course, one with vested interest for the answer. While GPT-oss's weights are openly available, the sources of training data are not clearly described in the model card...
Read more at fi-le.net