Introducing the SWE-Lancer benchmark
Introducing the SWE-Lancer benchmark | OpenAICan frontier LLMs earn $1 million from real-world freelance software engineering?We introduce SWE-Lancer, a benchmark of over 1,400 freelance software engineering tasks from Upwork, valued at $1 million USD total in real-world payouts. SWE-Lancer encompasses both independent engineering tasks — ranging from $50 bug fixes to $32,000 feature implementations — and managerial tasks, where models choose between technical implementation proposals. Independ...
Read more at openai.com