Ollama Launches New Engine for Multimodal AI Models, Enabling Vision and Language Tasks Across Multiple Platforms

Ollama's new engine for multimodal models · Ollama Blog

Ollama now supports multimodal models via Ollama’s new engine, starting with new vision multimodal models: Meta Llama 4 Google Gemma 3 Qwen 2.5 VL Mistral Small 3.1 and more vision models. General Multimodal Understanding & Reasoning Llama 4 Scout ollama run llama4:scout (Note: this is a 109 billion parameter, mixture-of-experts model.) Example: asking location-based questions about a video frame: You can then ask follow-up questions: ollama@ollamas-computer ~ % ollama run llama4:scout >>> what ...