New Benchmark MultiCodeBench Tests LLMs' Code Generation Across 12 Domains, 15 Languages; Reveals Performance Insights for Developers

How Well Do LLMs Generate Code for Different Application Domains? Benchmark and Evaluation

View PDF HTML (experimental) Abstract:Recently, an increasing number of AI-driven programming assistants powered by code LLMs have been integrated into various real-world software development environments, significantly boosting developer productivity. However, existing code generation benchmarks primarily focus on general-purpose scenarios, leaving the code generation performance of LLMs for specific application domains largely unknown. In this paper, we introduce a new benchmark, MultiCodeBenc...