"Redwood Research Achieves 50% Accuracy on ARC-AGI Public Test Set using GPT-4o, Surpassing Prior State-of-the-art Performance"

Getting 50% (SoTA) on ARC-AGI with GPT-4o

I recently got to 50%1 accuracy on the public test set for ARC-AGI by having GPT-4o generate a huge number of Python implementations of the transformation rule (around 8,000 per problem) and then selecting among these implementations based on correctness of the Python programs on the examples (if this is confusing, go to the next section)2. I use a variety of additional approaches and tweaks which overall substantially improve the performance of my method relative to just sampling 8,000 programs...

Read more at redwoodresearch.substack.com

Leaderboard Submit About