News Score: Score the News, Sort the News, Rewrite the Headlines

Evaluating LLMs for my personal use case

It’s great that AI can win maths Olympiads, but that’s not what I’m doing. I mostly ask basic Rust, Python, Linux and life questions. So I did my own evaluation. I gathered 130 real prompts from my bash history (I use command line tool llm). I had Qwen3 235B Thinking and Gemini 2.5 Pro group them into categories. They both chose very similar ones, broadly (with examples): Programming - “Write a bash script to ..” Sysadmin - “With curl how do I ..” Technical explanations - “Explain underlay netw...

Read more at darkcoding.net

© News Score  score the news, sort the news, rewrite the headlines