New AI Fine-Tuning Method Boosts LLM Performance: Researchers Optimize 'Best-of-N' Strategy, Improving Gemma 2B's Math and Coding Scores

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

View PDF HTML (experimental) Abstract:Recent studies have indicated that effectively utilizing inference-time compute is crucial for attaining better performance from large language models (LLMs). In this work, we propose a novel inference-aware fine-tuning paradigm, in which the model is fine-tuned in a manner that directly optimizes the performance of the inference-time strategy. We study this paradigm using the simple yet effective Best-of-N (BoN) inference strategy, in which a verifier selec...