OpenPipe Founder Uses RL, $4.80 GPU Time to Build Model Predicting HN Post Success; Explains RLHF Technique

Using Reinforcement Learning and $4.80 of GPU Time to Find the Best HN Post Ever (RLHF Part 1) - OpenPipe

Background: I’m Kyle, the founder of OpenPipe. OpenPipe is a managed fine-tuning service that makes it easy to build your own LLMs that achieve very high accuracy on a specific task. In this post we’ll go under the covers and explain RLHF, which is one of the techniques we use to accomplish this.What do the following Hacker News stories have in common?None reached the front page; in fact none of them even got any upvotes! But they were all identified by a fine-tuned model as being likely to do w...