"Google Introduces ScreenAI, a Vision-Language Model for User Interfaces and Infographics, With State-of-the-Art Performance and Three New Datasets"

ScreenAI: A visual language model for UI and visually-situated language understanding

We introduce ScreenAI, a vision-language model for user interfaces and infographics that achieves state-of-the-art results on UI and infographics-based tasks. We are also releasing three new datasets: Screen Annotation to evaluate the layout understanding capability of the model, as well as ScreenQA Short and Complex ScreenQA for a more comprehensive evaluation of its QA capability. Screen user interfaces (UIs) and infographics, such as charts, diagrams and tables, play important roles in human ...