News Score: Score the News, Sort the News, Rewrite the Headlines

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

View PDF HTML (experimental) Abstract:Recent studies have indicated that effectively utilizing inference-time compute is crucial for attaining better performance from large language models (LLMs). In this work, we propose a novel inference-aware fine-tuning paradigm, in which the model is fine-tuned in a manner that directly optimizes the performance of the inference-time strategy. We study this paradigm using the simple yet effective Best-of-N (BoN) inference strategy, in which a verifier selec...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines