GitHub - apple/ml-fastvlm: This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
FastVLM: Efficient Vision Encoding for Vision Language Models
This is the official repository of
FastVLM: Efficient Vision Encoding for Vision Language Models. (CVPR 2025)
Highlights
We introduce FastViTHD, a novel hybrid vision encoder designed to output fewer tokens and significantly reduce encoding time for high-resolution images.
Our smallest variant outperforms LLaVA-OneVision-0.5B with 85x faster Time-to-First-Token (TTFT) and 3.4x smaller vision encoder.
Our larger variants using Qwen2-7B...
Read more at github.com