GitHub - reworkd/tarsier: Vision utilities for web interaction agents 👀
🙈 Vision utilities for web interaction agents 🙈
🔗 Main site
•
🐦 Twitter
•
📢 Discord
Tarsier
If you've tried using an LLM to automate web interactions, you've probably run into questions like:
How should you feed the webpage to an LLM? (e.g. HTML, Accessibility Tree, Screenshot)
How do you map LLM responses back to web elements?
How can you inform a text-only LLM about the page's visual structure?
At Reworkd, we iterated on all these problems across tens of thousands of real web tasks to bui...
Read more at github.com