"LLaVA-1.6 Unveiled: Enhanced Reasoning, OCR, and World Knowledge, Outperforms Gemini Pro—New Release to Boost Large Multimodal Models"

LLaVA-1.6: Improved reasoning, OCR, and world knowledge

In October 2023, we released LLaVA-1.5 with a simple and efficient design along with great performance on a benchmark suite of 12 datasets. It has since served as the foundation of many comprehensive studies of data, model, and capabilities of large multimodal models (LMM), and has enabled various new applications. Today, we are thrilled to present LLaVA-1.6, with improved reasoning, OCR, and world knowledge. LLaVA-1.6 even exceeds Gemini Pro on several benchmarks. Compared with LLaVA-1.5, LLaVA...