Dragonfly: A large vision-language model with multi-resolution zoom
We are excited to announce the launch of Dragonfly, a breakthrough instruction-tuning Vision-language architecture, that enhances fine-grained visual understanding and reasoning about image regions. We are releasing the Dragonfly architecture, which uses multi-resolution zoom-and-select to enhance multi-modal reasoning while being context-efficient. We are also launching two new open-source models Llama-3-8b-Dragonfly-v1 a general-domain model trained on 5.5 million image-instruction pairs and ...
Read more at together.ai