Study: Direct Alignment Algorithms' Performance Relies on Pairwise vs. Pointwise Objectives, Not Specific Rewards or Losses

The Differences Between Direct Alignment Algorithms are a Blur

View PDF HTML (experimental) Abstract:Direct Alignment Algorithms (DAAs) simplify language model alignment by replacing reinforcement learning (RL) and reward modeling (RM) in Reinforcement Learning from Human Feedback (RLHF) with direct policy optimization. DAAs can be classified by their ranking losses (pairwise vs. pointwise), by the rewards used in those losses (e.g., likelihood ratios of policy and reference policy, or odds ratios), or by whether a Supervised Fine-Tuning (SFT) phase is requ...