Chapter 2. Lexical Search

It’s important to understand how lexical search works, because it predates and anticipates how semantic search works. Lexical search for unstructured text works directly with language by decomposing blocks of text into words and matching those words to text from a query. You might think of it as the “assembly code” of searching.

Consider a website that advises its users on which games to play. The builders of the site use game descriptions as the corpus of documents for their search engine. They might include the following two blocks of text, which discuss the dice game craps and the card game blackjack, in their respective documents:

Text sample A (craps) from Wikipedia: The shooter must shoot toward the farther back wall and is generally required to hit the farther back wall with both dice. Casinos may allow a few warnings before enforcing the dice to hit the back wall and are generally lenient if at least one die hits the back wall.
Text sample B (blackjack) from Wikipedia: The dealer deals from their left (“first base”) to their far right (“third base”). Each box gets an initial hand of two cards visible to the people playing on it. The dealer’s hand gets its first card face-up and, in “hole card” games, immediately gets a second card face-down.

For the search play dice games, text sample A seems like a better match than text sample B. Most search engines solve this problem in three phases—matching, merging, and ranking.

Matching

Search engines ...

Get Natural Language and Search now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Natural Language and Search by Jon Handler, Milind Shyani, Karen Kilroy

Chapter 2. Lexical Search

Matching

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly