Chapter 13. Full-Text Search

Now that we have covered the simple case of searching for structured data, it is time to explore full-text search: how to search within full-text fields in order to find the most relevant documents.

The two most important aspects of full-text search are as follows:

Relevance: The ability to rank results by how relevant they are to the given query, whether relevance is calculated using TF/IDF (see “What Is Relevance?”), proximity to a geolocation, fuzzy similarity, or some other algorithm.
Analysis: The process of converting a block of text into distinct, normalized tokens (see “Analysis and Analyzers”) in order to (a) create an inverted index and (b) query the inverted index.

As soon as we talk about either relevance or analysis, we are in the territory of queries, rather than filters.

Term-Based Versus Full-Text

While all queries perform some sort of relevance calculation, not all queries have an analysis phase. Besides specialized queries like the bool or function_score queries, which don’t operate on text at all, textual queries can be broken down into two families:

Term-based queries: Queries like the term or fuzzy queries are low-level queries that have no analysis phase. They operate on a single term. A term query for the term Foo looks for that exact term in the inverted index and calculates the TF/IDF relevance _score for each document that contains the term.

It is important to remember that the term query looks in the inverted index ...

Get Elasticsearch: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Elasticsearch: The Definitive Guide by

Chapter 13. Full-Text Search

Term-Based Versus Full-Text

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly