News Score: Score the News, Sort the News, Rewrite the Headlines

Expensively Quadratic: the LLM Agent Cost Curve - exe.dev blog

Pop quiz: at what point in the context length of a coding agent are cached reads costing you half of the next API call? By 50,000 tokens, your conversation’s costs are probably being dominated by cache reads. Let’s take a step back. We’ve previously written about how coding agents work: they post the conversation thus far to the LLM, and continue doing that in a loop as long as the LLM is requesting tool calls. When there are no more tools to run, the loop waits for user input, and the whole cyc...

Read more at blog.exe.dev

© News Score  score the news, sort the news, rewrite the headlines