Chapter 13. Full-Text Search
Now that we have covered the simple case of searching for structured data, it is time to explore full-text search: how to search within full-text fields in order to find the most relevant documents.
The two most important aspects of full-text search are as follows:
- Relevance
-
The ability to rank results by how relevant they are to the given query, whether relevance is calculated using TF/IDF (see “What Is Relevance?”), proximity to a geolocation, fuzzy similarity, or some other algorithm.
- Analysis
-
The process of converting a block of text into distinct, normalized tokens (see “Analysis and Analyzers”) in order to (a) create an inverted index and (b) query the inverted index.
As soon as we talk about either relevance or analysis, we are in the territory of queries, rather than filters.
Term-Based Versus Full-Text
While all queries perform some sort of relevance calculation, not all queries
have an analysis phase. Besides specialized queries like the bool
or
function_score
queries, which don’t operate on text at all, textual queries can
be broken down into two families:
- Term-based queries
-
Queries like the
term
orfuzzy
queries are low-level queries that have no analysis phase. They operate on a single term. Aterm
query for the termFoo
looks for that exact term in the inverted index and calculates the TF/IDF relevance_score
for each document that contains the term.It is important to remember that the
term
query looks in the inverted index ...
Get Elasticsearch: The Definitive Guide now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.