GitHub - microsoft/OmniParser: A simple screen parsing tool towards pure vision based GUI agent
OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent
📢 [Project Page] [V2 Blog Post] [Models V2] [Models V1.5] [huggingface space (to be updated)]
OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.
News
[2025/2] We release OmniParser V2 checkpoints. Watch Video
[2...
Read more at github.com