book

Introduction to Search with Sphinx

Name: Introduction to Search with Sphinx
Author: Andrew Aksyonoff
ISBN: 9781449308667

by Andrew Aksyonoff

April 2011

Beginner

148 pages

4h 15m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Introduction to Search with Sphinx
SPECIAL OFFER: Upgrade this ebook with O’Reilly
Preface
AudienceOrganization of This BookConventions Used in This BookUsing Code ExamplesWe’d Like to Hear from YouSafari® Books OnlineAcknowledgments
1. The World of Text Search
Terms and Concepts in SearchThinking in Documents Versus DatabasesWhy Do We Need Full-Text Indexes?Query LanguagesLogical Versus Full-Text ConditionsLogical conditionsFull-text queriesDifferences between logical and full-text searchesNatural Language ProcessingFrom Text to WordsLinguistics Crash CourseRelevance, As Seen from Outer SpaceResult Set PostprocessingFull-Text IndexesSearch WorkflowsKinds of DataIndexing ApproachesFull-Text Indexes and AttributesApproaches to SearchingKinds of Results
Getting Started with Sphinx
Workflow OverviewGetting Started ... in a MinuteBasic ConfigurationDefining Data SourcesDisk-based indexesRT indexesDistributed indexesDeclaring Fields and Attributes in SQL DataSphinx-Wide SettingsManaging Configurations with Inheritance and ScriptingAccessing searchdConfiguring InterfacesUsing SphinxAPIUsing SphinxQLBuilding Sphinx from SourceQuick BuildSource Build RequirementsConfiguring Sources and Building Binaries
3. Basic Indexing
Indexing SQL DataMain Fetch QueryPre-Queries, Post-Queries, and Post-Index QueriesHow the Various SQL Queries Work TogetherRanged Queries for Larger Data SetsIndexing XML DataIndex Schemas for XML DataXML Encodingsxmlpipe2 Elements ReferenceWorking with Character SetsHandling Stop Words and Short Words
4. Basic Searching
Matching ModesFull-Text Query SyntaxKnown OperatorsEscaping Special CharactersAND and OR Operators and a Notorious Precedence TrapNOT OperatorField Limit OperatorPhrase OperatorKeyword Proximity OperatorQuorum OperatorStrict Order (BEFORE) OperatorNEAR OperatorSENTENCE and PARAGRAPH OperatorsZONE Limit OperatorKeyword ModifiersResult Set Contents and LimitsSearching Multiple IndexesResult Set ProcessingExpressionsFilteringSortingGrouping
5. Managing Indexes
The “Divide and Conquer” ConceptIndex RotationPicking DocumentsHandling Updates and Deletions with K-ListsScheduling Rebuilds, and Using Multiple DeltasMerge Versus Rebuild Versus DeltasScripting and Reloading Configurations
6. Relevance and Ranking
Relevance Assessment: A Black ArtRelevance Ranking FunctionsSphinx Rankers ExplainedBM25 FactorPhrase Proximity FactorOverview of the Available RankersNitty-gritty Ranker DetailsHow Do I Draw Those Stars?How Do I Rank Exact Field Matches Higher?How Do I Force Document D to Rank First?How Does Sphinx Ranking Compare to System XYZ?Where to Go from Here
About the Author

SPECIAL OFFER: Upgrade this ebook with O’Reilly
Copyright

Content preview from Introduction to Search with Sphinx

Chapter 1. The World of Text Search

Words frequently have different meanings, and this is evident even in the short description of Sphinx itself. We used to call it a full-text search engine, which is a standard term in the IT knowledge domain. Nevertheless, this occasionally delivered the wrong impression of Sphinx being either a Google-competing web service, or an embeddable software library that only hardened C++ programmers would ever manage to implement and use. So nowadays, we tend to call Sphinx a search server to stress that it’s a suite of programs running on your hardware that you use to implement and maintain full-text searches, similar to how you use a database server to store and manipulate your data. Sphinx can serve you in a variety of different ways and help with quite a number of search-related tasks, and then some. The data sets range from indexing just a few blog posts to web-scale collections that contain billions of documents; workload levels vary from just a few searches per day on a deserted personal website to about 200 million queries per day on Craigslist; and query types fluctuate between simple quick queries that need to return top 10 matches on a given keyword and sophisticated analytical queries used for data mining tasks that combine thousands of keywords into a complex text query and add a few nontext conditions on top. So, there’s a lot of things that Sphinx can do, and therefore a lot to discuss. But before we begin, let’s ensure that we’re on the ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9780596809546

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Introduction to Search with Sphinx

by Andrew Aksyonoff

Chapter 1. The World of Text Search

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.