News Score: Score the News, Sort the News, Rewrite the Headlines

Getting 50% (SoTA) on ARC-AGI with GPT-4o

I recently got to 50%1 accuracy on the public test set for ARC-AGI by having GPT-4o generate a huge number of Python implementations of the transformation rule (around 8,000 per problem) and then selecting among these implementations based on correctness of the Python programs on the examples (if this is confusing, go to the next section)2. I use a variety of additional approaches and tweaks which overall substantially improve the performance of my method relative to just sampling 8,000 programs...

Read more at redwoodresearch.substack.com

© News Score  score the news, sort the news, rewrite the headlines