GPT-4o’s Memory Breakthrough! (NIAN code)
needle-in-a-needlestack
Needle in a Needlestack is a new benchmark to measure how well LLMs pay attention to the information in their context
window. NIAN creates a prompt that includes thousands of limericks and the prompt asks a question about one limerick
at a specific location. Here is an example prompt that includes 2500ish limericks. Until today, no
LLM was very good at this benchmark. Here are GPT-4 Turbo and Claude-3 Sonnet’s attempts at this benchmark:
gpt-4-turbo-2024-04-09
claude-...
Read more at nian.llmonpy.ai