News Score: Score the News, Sort the News, Rewrite the Headlines

OmniVision-968M: World's Smallest Vision Language Model

Your browser does not support the video tag.Omnivision is a compact, sub-billion (968M) multimodal model for processing both visual and text inputs, optimized for edge devices. Improved on LLaVA's architecture, it features:9x Tokens Reduction: Reduces image tokens from 729 to 81, cutting latency and computational cost.Enhanced Accuracy: Reduces hallucinations using DPO training from trustworthy data.Demo(OmniVision generated description for an image with multiple object)(OmniVision generated des...

Read more at nexa.ai

© News Score  score the news, sort the news, rewrite the headlines