Understanding the BM25 full text search algorithm
Nov 19, 2024
BM25, or Best Match 25, is a widely used algorithm for full text search. It is the default in Lucene/Elasticsearch and SQLite, among others. Recently, it has become common to combine full text search and vector similarity search into "hybrid search". I wanted to understand how full text search works, and specifically BM25, so here is my attempt at understanding by re-explaining.
Motivation: can BM25 scores be compared across queries?
Ranking documents probabilistically
Components of...
Read more at emschwartz.me