Skip to content
VIRUCHITH.COM
Back to articles
Search SearchAlgorithmFuzzy SearchComputer Science

What is Fuzzy Search? A forgiving approach to finding data

Stop losing users to simple typos. Learn how fuzzy search algorithms like Levenshtein Distance, N-grams, and Phonetic Matching create a more 'forgiving' search experience. Explore implementation strategies for both frontend (Fuse.js) and backend (Elasticsearch) environments.

Author

Viruchith Ganesan

Feb 1, 2026
5 min read
Fuzzy search comic illustration.

Imagine asking a very strict librarian for a book about dinosaurs. If you accidentally wrote dinosaur’s on your slip, a strict librarian would check their index, find zero matches for that exact spelling, and tell you the book doesn’t exist. A fuzzy librarian, however, would look at your slip and say, That looks a lot like dinosaurs. Here are the books we have on that topic. Fuzzy search is a technique used in software to find matches that are approximately equal to the search term, rather than exactly equal. It handles human errors like typos, misspellings, missing characters, or slightly different variations of a word. It is essential for a good user experience. Users expect search bars to understand their intent, not just their exact keystrokes.

How does a computer know that “kitten” is similar to “sitting”? It uses mathematical algorithms to measure the “distance” or similarity between two strings of text.

Here are the most popular ones:

1. Levenshtein Distance (The Edit Distance):

This is the most common algorithm. It calculates the minimum number of single-character edits required to change one word into another. An edit is defined as an insertion, a deletion, or a substitution.

  • Example: Changing cat to bat requires 1 substitution (c -gt; b). The distance is 1.
  • Example: Changing cot to cart requires 1 substitution (o -gt; a) and 1 insertion (r). The distance is 2.The lower the distance, the more similar the words are.

2. Damerau-Levenshtein Distance:

This is an improvement on Levenshtein. It includes the standard three edits but adds a fourth: transposition (swapping two adjacent letters).

  • Example: Typing “teh” instead of “the”. Levenshtein sees this as 2 edits (substitute ‘e’, substitute ‘h’). Damerau-Levenshtein sees this as 1 edit (swap ‘e’ and ‘h’). This is often more accurate for identifying common typing errors.

3. N-grams (Trigrams)

Instead of looking at the whole word, this method breaks strings into overlapping chunks of n characters (often 3, called “trigrams”).

  • Example: The word “banana” breaks down into these trigrams: ban, ana, nan, ana.

If you search for “bananna” (misspelled), it shares many of the same trigrams with “banana.” The search engine counts overlapping chunks to determine similarity. This is very efficient for large datasets.

4. Phonetic Matching (Soundex, Metaphone)

These algorithms encode words based on how they sound when spoken, rather than how they are spelled.

  • Example: “Smith” and “Smyth” might both be encoded to a code like S530. If the codes match, the words sound the same. This is great for names.

Scenario A: Frontend Implementation (Simple JS Dropdown)

Context: You have a dropdown menu on a webpage with a list of 200 country names. You want users to be able to type to filter the list, handling typos.

The Challenge: You have a small dataset loaded entirely in the user’s browser memory. You need something lightweight and fast in JavaScript.

The Solution: Fuse.js While you could write a Levenshtein algorithm from scratch in JS, it is inefficient. The standard industry approach for frontend fuzzy search is a lightweight library called Fuse.js.

Fuse.js is powerful, fast, and easy to configure. It uses a combination of bitap algorithms and Levenshtein distance principles optimized for JavaScript.

Implementation Example:

// 1. The Data (Small dataset loaded in browser)
const countries = [
  { name: "United States", code: "US" },
  { name: "United Kingdom", code: "UK" },
  { name: "Ukraine", code: "UA" },
  { name: "Uruguay", code: "UY" },
  // ... rest of the list
];

// 2. Configure Fuse options
const options = {
  includeScore: true,
  // 'threshold' determines how fuzzy the match is.
  // 0.0 = exact match only. 0.6 is usually a good fuzzy starting point.
  threshold: 0.4,
  // Which keys in your object do you want to search?
  keys: ['name']
};

// 3. Initialize Fuse with data and options
const fuse = new Fuse(countries, options);

// 4. Run search based on user input in the dropdown
const userInput = "Ukriane"; // Typo intended
const result = fuse.search(userInput);

console.log(result);

/* Output will prioritize the closest matches:
[
  { item: { name: "Ukraine", code: "UA" }, score: 0.18 },
  { item: { name: "United Kingdom", code: "UK" }, score: 0.55 },
  ...
]
*/

In this scenario, the JavaScript calculates the fuzziness on the fly whenever the user types.

Scenario B: Backend Implementation (Large Dataset)

Context: An e-commerce site like Amazon with 50 million products.

The Challenge: You cannot load 50 million products into a user’s browser. The search must happen on a powerful server. Furthermore, scanning every single product name to calculate Levenshtein distance every time someone searches is too slow.

The Solution: Elasticsearch or Apache Solr These are dedicated search engines (built on top of Lucene) designed for massive scale. They handle fuzzy search differently to ensure speed.

They use Inverted Indexes. Instead of scanning text during the search, they analyze the text when the data is saved (indexed).

There are two main ways they handle fuzzy search at scale:

1. The “Fuzzy Query” (Query Time)

Elasticsearch has a built-in fuzzy query type. When a user searches, it uses sophisticated edit-distance calculations (Levenshtein) optimized by automata theory to find close terms in the index.

You usually set fuzziness to AUTO, which tells the engine: If the word is short (3 letters), only allow exact matches. If its longer (lets say 6+ letters), allow 1 or 2 edits.

2. N-gram Indexing (Index Time - Better for Performance)

For the highest performance on huge datasets, you don’t do the math when the user searches. You prepare the data beforehand using N-grams.

When you save the product “Samsung TV” into Elasticsearch, you tell it to break the name down into trigrams and store them in the index: sam, ams, msu, sun, ung, tv.

When a user searches for “Samsong”, the engine breaks that query into trigrams and looks for documents that share the most trigrams with the query. This becomes a simple matching task rather than complex math, making it incredibly fast at scale.

Continue reading

Related articles