News Score: Score the News, Sort the News, Rewrite the Headlines

SWE-Bench+: Enhanced Coding Benchmark for LLMs

View PDF HTML (experimental) Abstract:Large Language Models (LLMs) in Software Engineering (SE) can offer assistance for coding. To facilitate a rigorous evaluation of LLMs in practical coding contexts, Carlos et al. introduced the SWE-bench dataset, which comprises 2,294 real-world GitHub issues and their corresponding pull requests, collected from 12 widely used Python repositories. Several impressive LLM-based toolkits recently are developed and evaluated on this dataset. However, a systemati...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines