How Well Do LLMs Generate Code for Different Application Domains? Benchmark and Evaluation
    View PDF
HTML (experimental)
Abstract:Recently, an increasing number of AI-driven programming assistants powered by code LLMs have been integrated into various real-world software development environments, significantly boosting developer productivity. However, existing code generation benchmarks primarily focus on general-purpose scenarios, leaving the code generation performance of LLMs for specific application domains largely unknown. In this paper, we introduce a new benchmark, MultiCodeBenc...
    Read more at arxiv.org