O'Reilly logo

Introduction to Search with Sphinx by Andrew Aksyonoff

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Preface

I can’t quite believe it, but just 10 years ago there was no Google.

Other web search engines were around back then, such as AltaVista, HotBot, Inktomi, and AllTheWeb, among others. So the stunningly swift ascendance of Google can settle in my mind, given some effort. But what’s even more unbelievable is that just 20 years ago there were no web search engines at all. That’s only logical, because there was barely any Web! But it’s still hardly believable today.

The world is rapidly changing. The volume of information available and the connection bandwidth that gives us access to that information grows substantially every year, making all the kinds—and volumes!—of data increasingly accessible. A 1-million-row database of geographical locations, which was mind-blowing 20 years ago, is now something a fourth-grader can quickly fetch off the Internet and play with on his netbook. But the processing rate at which human beings can consume information does not change much (and said fourth-grader would still likely have to read complex location names one syllable at a time). This inevitably transforms searching from something that only eggheads would ever care about to something that every single one of us has to deal with on a daily basis.

Where does this leave the application developers for whom this book is written? Searching changes from a high-end, optional feature to an essential functionality that absolutely has to be provided to end users. People trained by Google no longer expect a 50-component form with check boxes, radio buttons, drop-down lists, roll-outs, and every other bell and whistle that clutters an application GUI to the point where it resembles a Boeing 797 pilot deck. They now expect a simple, clean text search box.

But this simplicity is an illusion. A whole lot is happening under the hood of that text search box. There are a lot of different usage scenarios, too: web searching, vertical searching such as product search, local email searching, image searching, and other search types. And while a search system such as Sphinx relieves you from the implementation details of complex, low-level, full-text index and query processing, you will still need to handle certain high-level tasks.

How exactly will the documents be split into keywords? How will the queries that might need additional syntax (such as cats AND dogs) work? How do you implement matching that is more advanced than just exact keyword matching? How do you rank the results so that the text that is most likely to interest the reader will pop up near the top of a 200-result list, and how do you apply your business requirements to that ranking? How do you maintain the search system instance? Show nicely formatted snippets to the user? Set up a cluster when your database grows past the point where it can be handled on a single machine? Identify and fix bottlenecks if queries start working slowly? These are only a few of all the questions that come up during development, which only you and your team can answer because the choices are specific to your particular application.

This book covers most of the basic Sphinx usage questions that arise in practice. I am not aiming to talk about all the tricky bits and visit all the dark corners; because Sphinx is currently evolving so rapidly that even the online documentation lags behind the software, I don’t think comprehensiveness is even possible. What I do aim to create is a practical field manual that teaches you how to use Sphinx from a basic to an advanced level.

Audience

I assume that readers have a basic familiarity with tools for system administrators and programmers, including the command line and simple SQL. Programming examples are in PHP, because of its popularity for website development.

Organization of This Book

This book consists of six chapters, organized as follows:

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, filenames, Unix utilities, and command-line options

Constant width

Indicates variables and other code elements, the contents of files, and the output from commands

Constant width bold

Shows commands or other text that should be typed literally by the user (such as the contents of full-text queries)

Constant width italic

Shows text that should be replaced with user-supplied values

Note

This icon signifies a tip, suggestion, or general note.

Using Code Examples

This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Introduction to Search with Sphinx, by Andrew Aksyonoff. Copyright 2011 Andrew Aksyonoff, 978-0-596-80955-3.”

If you feel your use of code examples falls outside fair use or the permission given here, feel free to contact us at .

We’d Like to Hear from You

Every example in this book has been tested on various platforms, but occasionally you may encounter problems. The information in this book has also been verified at each step of the production process. However, mistakes and oversights can occur and we will gratefully receive details of any you find, as well as any suggestions you would like to make for future editions. You can contact the authors and editors at:

O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
(800) 998-9938 (in the United States or Canada)
(707) 829-0515 (international or local)
(707) 829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at:

http://www.oreilly.com/catalog/9780596809553

To comment or ask technical questions about this book, send email to the following address, mentioning the book’s ISBN (978-0-596-80955-3):

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Safari® Books Online

Note

Safari Books Online is an on-demand digital library that lets you easily search over 7,500 technology and creative reference books and videos to find the answers you need quickly.

With a subscription, you can read any page and watch any video from our library online. Read books on your cell phone and mobile devices. Access new titles before they are available for print, and get exclusive access to manuscripts in development and post feedback for the authors. Copy and paste code samples, organize your favorites, download chapters, bookmark key sections, create notes, print out pages, and benefit from tons of other time-saving features.

O’Reilly Media has uploaded this book to the Safari Books Online service. To have full digital access to this book and others on similar topics from O’Reilly and other publishers, sign up for free at http://my.safaribooksonline.com.

Acknowledgments

Special thanks are due to Peter Zaitsev for all his help with the Sphinx project over the years and to Andy Oram for being both very committed and patient while making the book happen. I would also like to thank the rest of the O'Reilly team involved and, last but not least, the rest of the Sphinx team.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required