BM25 (Best Matching 25) is a ranking function used in information retrieval to rank documents based on their relevance to a search query. It's an improvement over TF-IDF that addresses some of its limitations.
Key improvements over TF-IDF:
Formula:
BM25(D, Q) = Σ IDF(qi) × (f(qi,D) × (k1 + 1)) / (f(qi,D) + k1 × (1 - b + b × |D| / avgdl))
Where:
f(qi,D) = frequency of term qi in document D|D| = length of document Davgdl = average document lengthk1 = term frequency saturation parameter (typical: 1.2-2.0)b = length normalization parameter (typical: 0.75)