Alibaba's Qwen3-VL AI model analyzes two-hour videos with 99.5% accuracy, beats GPT-5 and Gemini on visual math tasks using 235-billion parameters

Qwen3-VL can scan two-hour videos and pinpoint nearly every detail

A few months after launching Qwen3-VL, Alibaba has released a detailed technical report on the open multimodal model. The data shows the system excels at image-based math tasks and can analyze hours of video footage. The system handles massive data loads, processing two-hour videos or hundreds of document pages within a 256,000-token context window. In "needle-in-a-haystack" tests, the flagship 235-billion-parameter model located individual frames in 30-minute videos with 100 percent accuracy. E...