LLMs are Bad Judges. So use Our Classifier Instead.
68 Pages
Posted: 8 Jul 2025
Last revised: 8 Jul 2025
Date Written: June 30, 2025
Abstract
Large Language Models suffer from prompt variance— meaning they’ll give you totally different legal answers depending on how you phrase your question. Jonathan Choi demonstrated this recently when he asked ChatGPT five legal questions, each rephrased 2,000 times, and watched as the bot spat out different answers every time. When you tell somebody that AI is going to replace the judge, the lawyer, and the le...
Read more at papers.ssrn.com