"GPT-4o Excels in Needle in a Needlestack Benchmark, Outpacing Predecessor GPT-4 Turbo in Critical Memory Test"

GPT-4o’s Memory Breakthrough! (NIAN code)

needle-in-a-needlestack Needle in a Needlestack is a new benchmark to measure how well LLMs pay attention to the information in their context window. NIAN creates a prompt that includes thousands of limericks and the prompt asks a question about one limerick at a specific location. Here is an example prompt that includes 2500ish limericks. Until today, no LLM was very good at this benchmark. Here are GPT-4 Turbo and Claude-3 Sonnet’s attempts at this benchmark: gpt-4-turbo-2024-04-09 claude-...