"Reworkd's Tarsier: An Innovative Vision Utility Enhancing Web Interaction Agents, Outperforms GPT-4V on Performance Benchmarks"

GitHub - reworkd/tarsier: Vision utilities for web interaction agents 👀

🙈 Vision utilities for web interaction agents 🙈 🔗 Main site • 🐦 Twitter • 📢 Discord Tarsier If you've tried using an LLM to automate web interactions, you've probably run into questions like: How should you feed the webpage to an LLM? (e.g. HTML, Accessibility Tree, Screenshot) How do you map LLM responses back to web elements? How can you inform a text-only LLM about the page's visual structure? At Reworkd, we iterated on all these problems across tens of thousands of real web tasks to bui...