GPT-4 Turbo with Vision is a step backwards for coding
OpenAI just released GPT-4 Turbo with Vision
and it performs worse on aider’s coding benchmark suites than all the previous GPT-4 models.
In particular, it seems much more prone to “lazy coding” than the
existing GPT-4 Turbo “preview” models.
Code editing skill
Aider relies on a
code editing benchmark
to quantitatively evaluate how well
an LLM can make changes to existing code.
The benchmark uses aider to try and complete
133 Exercism Python coding exercises.
For each exercise, the LLM gets two ...
Read more at aider.chat