I can’t quite believe it, but just 10 years ago there was no Google.
Other web search engines were around back then, such as AltaVista, HotBot, Inktomi, and AllTheWeb, among others. So the stunningly swift ascendance of Google can settle in my mind, given some effort. But what’s even more unbelievable is that just 20 years ago there were no web search engines at all. That’s only logical, because there was barely any Web! But it’s still hardly believable today.
The world is rapidly changing. The volume of information available and the connection bandwidth that gives us access to that information grows substantially every year, making all the kinds—and volumes!—of data increasingly accessible. A 1-million-row database of geographical locations, which was mind-blowing 20 years ago, is now something a fourth-grader can quickly fetch off the Internet and play with on his netbook. But the processing rate at which human beings can consume information does not change much (and said fourth-grader would still likely have to read complex location names one syllable at a time). This inevitably transforms searching from something that only eggheads would ever care about to something that every single one of us has to deal with on a daily basis.
Where does this leave the application developers for whom this book is written? Searching changes from a high-end, optional feature to an essential functionality that absolutely has to be provided to end users. People trained by Google no longer expect a 50-component form with check boxes, radio buttons, drop-down lists, roll-outs, and every other bell and whistle that clutters an application GUI to the point where it resembles a Boeing 797 pilot deck. They now expect a simple, clean text search box.
But this simplicity is an illusion. A whole lot is happening under the hood of that text search box. There are a lot of different usage scenarios, too: web searching, vertical searching such as product search, local email searching, image searching, and other search types. And while a search system such as Sphinx relieves you from the implementation details of complex, low-level, full-text index and query processing, you will still need to handle certain high-level tasks.
How exactly will the documents be split into keywords? How will the
queries that might need additional syntax (such as
cats AND dogs) work? How do you implement
matching that is more advanced than just exact keyword matching? How do you
rank the results so that the text that is most likely to interest the reader
will pop up near the top of a 200-result list, and how do you apply your
business requirements to that ranking? How do you maintain the search system
instance? Show nicely formatted snippets to the user? Set up a cluster when
your database grows past the point where it can be handled on a single
machine? Identify and fix bottlenecks if queries start working slowly? These
are only a few of all the questions that come up during development, which
only you and your team can answer because the choices are specific to your
This book covers most of the basic Sphinx usage questions that arise in practice. I am not aiming to talk about all the tricky bits and visit all the dark corners; because Sphinx is currently evolving so rapidly that even the online documentation lags behind the software, I don’t think comprehensiveness is even possible. What I do aim to create is a practical field manual that teaches you how to use Sphinx from a basic to an advanced level.
I assume that readers have a basic familiarity with tools for system administrators and programmers, including the command line and simple SQL. Programming examples are in PHP, because of its popularity for website development.
This book consists of six chapters, organized as follows:
Chapter 1, The World of Text Search, lays out the types of search and the concepts you need to understand regarding the particular ways Sphinx conducts searches.
Chapter , Getting Started with Sphinx, tells you how to install and configure Sphinx, and run a few basic tests.
Chapter 3, Basic Indexing, shows you how to set up Sphinx indexing for either an SQL database or XML data, and includes some special topics such as handling different character sets.
Chapter 4, Basic Searching, describes the syntax of search text, which can be exposed to the end user or generated from an application, and the effects of various search options.
Chapter 5, Managing Indexes, offers strategies for dealing with large data sets (which means nearly any real-life data set, such as multi-index searching).
Chapter 6, Relevance and Ranking, gives you some guidelines for the crucial goal of presenting the best results to the user first.
The following typographical conventions are used in this book:
Indicates new terms, URLs, filenames, Unix utilities, and command-line options
Indicates variables and other code elements, the contents of files, and the output from commands
Constant width bold
Shows commands or other text that should be typed literally by the user (such as the contents of full-text queries)
Constant width italic
Shows text that should be replaced with user-supplied values
This icon signifies a tip, suggestion, or general note.
This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.
We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Introduction to Search with Sphinx, by Andrew Aksyonoff. Copyright 2011 Andrew Aksyonoff, 978-0-596-80955-3.”
If you feel your use of code examples falls outside fair use or the permission given here, feel free to contact us at email@example.com.
Every example in this book has been tested on various platforms, but occasionally you may encounter problems. The information in this book has also been verified at each step of the production process. However, mistakes and oversights can occur and we will gratefully receive details of any you find, as well as any suggestions you would like to make for future editions. You can contact the authors and editors at:
|O’Reilly Media, Inc.|
|1005 Gravenstein Highway North|
|Sebastopol, CA 95472|
|(800) 998-9938 (in the United States or Canada)|
|(707) 829-0515 (international or local)|
|(707) 829-0104 (fax)|
We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at:
To comment or ask technical questions about this book, send email to the following address, mentioning the book’s ISBN (978-0-596-80955-3):
For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
Safari Books Online is an on-demand digital library that lets you easily search over 7,500 technology and creative reference books and videos to find the answers you need quickly.
With a subscription, you can read any page and watch any video from our library online. Read books on your cell phone and mobile devices. Access new titles before they are available for print, and get exclusive access to manuscripts in development and post feedback for the authors. Copy and paste code samples, organize your favorites, download chapters, bookmark key sections, create notes, print out pages, and benefit from tons of other time-saving features.
O’Reilly Media has uploaded this book to the Safari Books Online service. To have full digital access to this book and others on similar topics from O’Reilly and other publishers, sign up for free at http://my.safaribooksonline.com.
Special thanks are due to Peter Zaitsev for all his help with the Sphinx project over the years and to Andy Oram for being both very committed and patient while making the book happen. I would also like to thank the rest of the O'Reilly team involved and, last but not least, the rest of the Sphinx team.