Ollama's new engine for multimodal models · Ollama Blog
Ollama now supports multimodal models via Ollama’s new engine, starting with new vision multimodal models:
Meta Llama 4
Google Gemma 3
Qwen 2.5 VL
Mistral Small 3.1
and more vision models.
General Multimodal Understanding & Reasoning
Llama 4 Scout
ollama run llama4:scout
(Note: this is a 109 billion parameter, mixture-of-experts model.)
Example: asking location-based questions about a video frame:
You can then ask follow-up questions:
ollama@ollamas-computer ~ % ollama run llama4:scout
>>> what ...
Read more at ollama.com