"OpenAI's New GPT-4 Turbo with Vision Underperforms in Coding Tasks, Scores Lowest in Aider’s Benchmark Suites"

GPT-4 Turbo with Vision is a step backwards for coding

OpenAI just released GPT-4 Turbo with Vision and it performs worse on aider’s coding benchmark suites than all the previous GPT-4 models. In particular, it seems much more prone to “lazy coding” than the existing GPT-4 Turbo “preview” models. Code editing skill Aider relies on a code editing benchmark to quantitatively evaluate how well an LLM can make changes to existing code. The benchmark uses aider to try and complete 133 Exercism Python coding exercises. For each exercise, the LLM gets two ...