Tech Enthusiast Evaluates 11 Large Language Models for Personal Use: Compares Performance, Cost, and Speed

Evaluating LLMs for my personal use case

It’s great that AI can win maths Olympiads, but that’s not what I’m doing. I mostly ask basic Rust, Python, Linux and life questions. So I did my own evaluation. I gathered 130 real prompts from my bash history (I use command line tool llm). I had Qwen3 235B Thinking and Gemini 2.5 Pro group them into categories. They both chose very similar ones, broadly (with examples): Programming - “Write a bash script to ..” Sysadmin - “With curl how do I ..” Technical explanations - “Explain underlay netw...