News Score: Score the News, Sort the News, Rewrite the Headlines

Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments

View PDF HTML (experimental) Abstract:This paper introduces an open-source benchmark for evaluating Vision-Language Models (VLMs) on Optical Character Recognition (OCR) tasks in dynamic video environments. We present a curated dataset containing 1,477 manually annotated frames spanning diverse domains, including code editors, news broadcasts, YouTube videos, and advertisements. Three state of the art VLMs - Claude-3, Gemini-1.5, and GPT-4o are benchmarked against traditional OCR systems such as ...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines