ScreenAI: A visual language model for UI and visually-situated language understanding
We introduce ScreenAI, a vision-language model for user interfaces and infographics that achieves state-of-the-art results on UI and infographics-based tasks. We are also releasing three new datasets: Screen Annotation to evaluate the layout understanding capability of the model, as well as ScreenQA Short and Complex ScreenQA for a more comprehensive evaluation of its QA capability.
Screen user interfaces (UIs) and infographics, such as charts, diagrams and tables, play important roles in human ...
Read more at research.google