"Together.AI Launches Dragonfly: Breakthrough Vision-Language Model Boosting Fine-Grained Visual Understanding; Unveils Open-Source Models Trained on Over 5.5 Million Image-Instruction Pairs"

Dragonfly: A large vision-language model with multi-resolution zoom

We are excited to announce the launch of Dragonfly, a breakthrough instruction-tuning Vision-language architecture, that enhances fine-grained visual understanding and reasoning about image regions. We are releasing the Dragonfly architecture, which uses multi-resolution zoom-and-select to enhance multi-modal reasoning while being context-efficient. We are also launching two new open-source models Llama-3-8b-Dragonfly-v1 a general-domain model trained on 5.5 million image-instruction pairs and ...