"Exploring Evaluation Methods for Fine-tuned Language Models in Structured Data Generation Task: Alex Strick van Linschoten's Insights"

Alex Strick van Linschoten - How to think about creating a dataset for LLM finetuning evaluation

I previously experimented with one-click LLM finetuning providers and now is a good time to return to the core of the matter: evaluating how well all these fine-tuned models and experiments are faring. I have a gut feeling that my fine-tuned models did pretty well, but we’re not in the business of gut feeling so I’m hoping to be able to put some real numbers down to either prove or disprove this hypothesis. As a quick reminder if you didn’t read any of the previous posts in the series, I’m build...