8.7
"GitHub Project Offers Comprehensive Toolkit for Reinforcement Learning from Human Feedback (RLHF), Supports Fine-Tuning, Reward Model Training, and PPO & DPO Algorithms on Alpaca, LLaMA, LLaMA2 Models"
github.com
#
©
News Score
score the news, sort the news, rewrite the headlines
Leaderboard
Submit
About