A Picture is Worth 170 Tokens: How Does GPT-4o Encode Images? - OranLooney.com
June 5, 2024
Machine Learning
LLM
CNN
Here’s a fact: GPT-4o charges 170 tokens to process each 512x512 tile
used in high-res mode. At ~0.75 tokens/word, this suggests a picture is worth
about 227 words—only a factor of four off from the traditional saying.
(There’s also an 85 tokens charge for a low-res ‘master thumbnail’ of each picture
and higher resolution images are broken into many such 512x512 tiles,
but let’s just focus on a single high-res tile.)
OK, but why 170? It’s an oddly specific n...
Read more at oranlooney.com