book

Introduction to Search with Sphinx

Name: Introduction to Search with Sphinx
Author: Andrew Aksyonoff
ISBN: 9781449308667

by Andrew Aksyonoff

April 2011

Beginner

148 pages

4h 15m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Introduction to Search with Sphinx
SPECIAL OFFER: Upgrade this ebook with O’Reilly
Preface
AudienceOrganization of This BookConventions Used in This BookUsing Code ExamplesWe’d Like to Hear from YouSafari® Books OnlineAcknowledgments
1. The World of Text Search
Terms and Concepts in SearchThinking in Documents Versus DatabasesWhy Do We Need Full-Text Indexes?Query LanguagesLogical Versus Full-Text ConditionsLogical conditionsFull-text queriesDifferences between logical and full-text searchesNatural Language ProcessingFrom Text to WordsLinguistics Crash CourseRelevance, As Seen from Outer SpaceResult Set PostprocessingFull-Text IndexesSearch WorkflowsKinds of DataIndexing ApproachesFull-Text Indexes and AttributesApproaches to SearchingKinds of Results
Getting Started with Sphinx
Workflow OverviewGetting Started ... in a MinuteBasic ConfigurationDefining Data SourcesDisk-based indexesRT indexesDistributed indexesDeclaring Fields and Attributes in SQL DataSphinx-Wide SettingsManaging Configurations with Inheritance and ScriptingAccessing searchdConfiguring InterfacesUsing SphinxAPIUsing SphinxQLBuilding Sphinx from SourceQuick BuildSource Build RequirementsConfiguring Sources and Building Binaries
3. Basic Indexing
Indexing SQL DataMain Fetch QueryPre-Queries, Post-Queries, and Post-Index QueriesHow the Various SQL Queries Work TogetherRanged Queries for Larger Data SetsIndexing XML DataIndex Schemas for XML DataXML Encodingsxmlpipe2 Elements ReferenceWorking with Character SetsHandling Stop Words and Short Words
4. Basic Searching
Matching ModesFull-Text Query SyntaxKnown OperatorsEscaping Special CharactersAND and OR Operators and a Notorious Precedence TrapNOT OperatorField Limit OperatorPhrase OperatorKeyword Proximity OperatorQuorum OperatorStrict Order (BEFORE) OperatorNEAR OperatorSENTENCE and PARAGRAPH OperatorsZONE Limit OperatorKeyword ModifiersResult Set Contents and LimitsSearching Multiple IndexesResult Set ProcessingExpressionsFilteringSortingGrouping
5. Managing Indexes
The “Divide and Conquer” ConceptIndex RotationPicking DocumentsHandling Updates and Deletions with K-ListsScheduling Rebuilds, and Using Multiple DeltasMerge Versus Rebuild Versus DeltasScripting and Reloading Configurations
6. Relevance and Ranking
Relevance Assessment: A Black ArtRelevance Ranking FunctionsSphinx Rankers ExplainedBM25 FactorPhrase Proximity FactorOverview of the Available RankersNitty-gritty Ranker DetailsHow Do I Draw Those Stars?How Do I Rank Exact Field Matches Higher?How Do I Force Document D to Rank First?How Does Sphinx Ranking Compare to System XYZ?Where to Go from Here
About the Author

SPECIAL OFFER: Upgrade this ebook with O’Reilly
Copyright

Content preview from Introduction to Search with Sphinx

Chapter 6. Relevance and Ranking

You’re now armed with a good chunk of knowledge about getting up and running with Sphinx, creating and managing indexes, and writing proper queries. However, there’s one more skill that’s of use with nearly every site: improving search quality. So, let’s spend some time discussing quality in general and what Sphinx can offer, shall we?

Relevance Assessment: A Black Art

We can’t really chase down “search quality” until we formally define it and decide how we measure it. An empirical approach, as in “Here, I just made up another custom ranking rule out of thin air and I think it will generally improve our results any time of day,” wears out very soon. After about the third such rule, you can no longer manage such an approach, because the total number of rule combinations explodes combinatorially, and arguing about (not to mention proving) the value of every single combination quickly becomes impossible. A scientific approach, as in “Let us introduce some comprehensible numerical metrics that can be computed programmatically and then grasped intuitively,” yields to automation and scales somewhat better.

So, what is search quality? Chapter 1 mentioned that documents in the result set are, by default, ordered using a relevance ranking function that assigns a different weight to every document, based on the current query, document contents, other document attributes, and other factors. But it’s very important to realize that the relevance value that is computed ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9780596809546

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Introduction to Search with Sphinx

by Andrew Aksyonoff

Chapter 6. Relevance and Ranking

Relevance Assessment: A Black Art

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.