Interactive demonstrations of the basic principles behind search engines
Thanks to computers, we were able in the most recent decades to store efficiently more information than ever before. Searching effectively through all these information, and finding the right document we are searching for is key to make good use of this ever growing amount of data.
Some search technics can be pretty advanced, but the basis of what we call Information Retrieval can be explained pretty simply.
This playground explores the core algorithms that power search systems, from traditional keyword-based approaches to modern semantic search techniques. Each demo is interactive, and should be played with to better understand the pros and cons of each technic explored.
Traditional search systems rely on lexical matching, meaning that they look for overlapping words between your query and the documents.
The key insight is that not all words are equally important. A document that mentions "quantum" repeatedly when most documents don't is probably more relevant for a query about "quantum physics" than one that just happens to use the word "the" many times.
TF-IDF is an algorithm that formalize this intuition by balancing term frequency (how often a word appears in a document) against it's frequency in the corpus (how common it is across all documents).
Another algorithm, BM25, goes even further by addressing issues like term saturation and document length normalization.
Modern search systems go beyond simple keyword matching by understanding meaning.
Vector search capture the meaning of the queries and documents in what we call vectors. Mathematical objects that allows us then to compute the proximity between the queries and documents
With this technique, even if they don't share a single word, a document that is semantically similar to your query can be retrieved.
For example, a search for "machine learning" might return documents about "neural networks" or "artificial intelligence" because these concepts are embedded near each other in the vector space.
Term Frequency-Inverse Document Frequency is the foundation of keyword-based search. See how it identifies important terms by balancing how often words appear in a document against how rare they are across your entire collection.
Try TF-IDF Demo →Best Matching 25 improves on TF-IDF with term frequency saturation and document length normalization.
Try BM25 Demo →Semantic search using vector representations and cosine similarity. Watch as documents are ranked by their semantic closeness to your query, with a 2D visualization showing the geometry of similarity.
Try Vector Search Demo →