News Score: Score the News, Sort the News, Rewrite the Headlines

Classic ML to cope with Dumb LLM Judges

In previous posts I use a local LLM to choose which two products were more relevant for a search query (see this github repo). Using human labels in an open e-commerce search dataset as a baseline (WANDS from Wayfair), I measure the LLM’s preference for a product, seeing if it matches human raters. If I can do this, then I can use my laptop as the search relevance judge. This can then guide search quality tuning and iterations, without an expensive OpenAI bill. My goal, not so much to replace ot...

Read more at softwaredoug.com

© News Score  score the news, sort the news, rewrite the headlines