Chapter 1. The World of Text Search
Words frequently have different meanings, and this is evident even in the short description of Sphinx itself. We used to call it a full-text search engine, which is a standard term in the IT knowledge domain. Nevertheless, this occasionally delivered the wrong impression of Sphinx being either a Google-competing web service, or an embeddable software library that only hardened C++ programmers would ever manage to implement and use. So nowadays, we tend to call Sphinx a search server to stress that it’s a suite of programs running on your hardware that you use to implement and maintain full-text searches, similar to how you use a database server to store and manipulate your data. Sphinx can serve you in a variety of different ways and help with quite a number of search-related tasks, and then some. The data sets range from indexing just a few blog posts to web-scale collections that contain billions of documents; workload levels vary from just a few searches per day on a deserted personal website to about 200 million queries per day on Craigslist; and query types fluctuate between simple quick queries that need to return top 10 matches on a given keyword and sophisticated analytical queries used for data mining tasks that combine thousands of keywords into a complex text query and add a few nontext conditions on top. So, there’s a lot of things that Sphinx can do, and therefore a lot to discuss. But before we begin, let’s ensure that we’re on the ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access