Web Bench - A new way to compare AI Browser Agents
TL;DR: Web Bench is a new dataset to evaluate web browsing agents that consists of 5,750 tasks on 452 different websites, with 2,454 tasks being open sourced. Anthropic Sonnet 3.7 CUA is the current SOTA, with the detailed results here.Over the past few months, Web Browsing agents such as Skyvern, Browser-use and OpenAI's Operator (CUA) have taken the world by storm. These agents have been used in production for a variety of tasks, from helping people apply to jobs, downloading invoices, and eve...
Read more at blog.skyvern.com