book

Solr in Action

Name: Solr in Action
ISBN: 9781617291029

by Timothy Potter, Trey Grainger

March 2014

Intermediate to advanced

664 pages

21h 15m

English

Manning Publications

Read now

Unlock full access

Copyright
Brief Table of Contents
Table of Contents
Foreword
Preface
Acknowledgments
About this Book
Part 1. Meet Solr
Chapter 1. Introduction to Solr
1.1. Why do I need a search engine?1.2. What is Solr?1.3. Why Solr?1.4. Features overview1.5. Summary
Chapter 2. Getting to know Solr
2.1. Getting started2.2. Searching is what it’s all about2.3. Tour of the Solr administration console2.4. Adapting the example to your needs2.5. Summary

Chapter 3. Key Solr concepts
3.1. Searching, matching, and finding content3.2. Relevancy3.3. Precision and Recall3.4. Searching at scale3.5. Summary
Chapter 4. Configuring Solr
4.1. Overview of solrconfig.xml4.2. Query request handling4.3. Managing searchers4.4. Cache management4.5. Remaining configuration options4.6. Summary
Chapter 5. Indexing
5.1. Example microblog search application5.2. Designing your schema5.3. Defining fields in schema.xml5.4. Field types for structured nontext fields5.5. Sending documents to Solr for indexing5.6. Update handler5.7. Index management5.8. Summary
Chapter 6. Text analysis
6.1. Analyzing microblog text6.2. Basic text analysis6.3. Defining a custom field type for microblog text6.4. Advanced text analysis6.5. Summary
Part 2. Core Solr capabilities
Chapter 7. Performing queries and handling results
7.1. The anatomy of a Solr request7.2. Working with query parsers7.3. Queries and filters7.4. The default query parser (Lucene query parser)7.5. Handling user queries (eDisMax query parser)7.6. Other useful query parsers7.7. Returning results7.8. Sorting results7.9. Debugging query results7.10. Summary
Chapter 8. Faceted search
8.1. Navigating your content at a glance8.2. Setting up test data8.3. Field faceting8.4. Query faceting8.5. Range faceting8.6. Filtering upon faceted values8.7. Multiselect faceting, keys, and tags8.8. Beyond the basics8.9. Summary
Chapter 9. Hit highlighting
9.1. Overview of hit highlighting9.2. How highlighting works9.3. Improving performance using FastVectorHighlighter9.4. PostingsHighlighter9.5. Summary
Chapter 10. Query suggestions
10.1. Spell-check10.2. Autosuggesting query terms10.3. Suggesting document field values10.4. Suggesting queries based on user activity10.5. Summary
Chapter 11. Result grouping/field collapsing
11.1. Result grouping vs. field collapsing11.2. Skipping duplicate documents11.3. Returning multiple documents per group11.4. Grouping by functions and queries11.5. Paging and sorting grouped results11.6. Grouping gotchas11.7. Efficient field collapsing with the Collapsing query parser11.8. Summary
Chapter 12. Taking Solr to production
12.1. Developing a Solr distribution12.2. Deploying Solr12.3. Hardware and server configuration12.4. Data acquisition strategies12.5. Sharding and replication12.6. Solr core management12.7. Managing clusters of servers12.8. Querying and interacting with Solr12.9. Monitoring Solr’s performance12.10. Upgrading between Solr versions12.11. Summary
Part 3. Taking Solr to the next level
Chapter 13. SolrCloud
13.1. Getting started with SolrCloud13.2. Core concepts13.3. Distributed indexing13.4. Distributed search13.5. Collections API13.6. Basic system-administration tasks13.7. Advanced topics13.8. Summary
Chapter 14. Multilingual search
14.1. Why linguistic analysis matters14.2. Stemming vs. lemmatization14.3. Stemming in action14.4. Handling edge cases14.5. Available language libraries in Solr14.6. Searching content in multiple languages14.7. Language identification14.8. Summary
Chapter 15. Complex query operations
15.1. Function queries15.2. Geospatial search15.3. Pivot faceting15.4. Referencing external data15.5. Cross-document and cross-index joins15.6. Big data analytics with Solr15.7. Summary
Chapter 16. Mastering relevancy
16.1. The impact of relevancy tuning16.2. Debugging the relevancy calculation16.3. Relevancy boosting16.4. Pluggable Similarity class implementations16.5. Personalized search and recommendations16.6. Creating a personalized search experience16.7. Running relevancy experiments16.8. Summary
A.1. Pulling the right version of Solr
A.2. Setting up Solr in your IDEA.3. Debugging Solr codeA.4. Downloading and applying Solr patchesA.5. Contributing patchesAppendix A. Working with the Solr codebase
C.1. Indexing Wikipedia
C.2. Indexing Stack ExchangeAppendix C. Useful data import configurations
Appendix B. Language-specific field type configurations
Index
List of Figures
List of Tables
List of Listings

Overview

Solr in Action is a comprehensive guide to implementing scalable search using Apache Solr. This clearly written book walks you through well-documented examples ranging from basic keyword searching to scaling a system for billions of documents and queries. It will give you a deep understanding of how to implement core Solr capabilities.

About the Technology

About the Book

Whether you're handling big (or small) data, managing documents, or building a website, it is important to be able to quickly search through your content and discover meaning in it. Apache Solr is your tool: a ready-to-deploy, Lucene-based, open source, full-text search engine. Solr can scale across many servers to enable real-time queries and data analytics across billions of documents.

Solr in Action teaches you to implement scalable search using Apache Solr. This easy-to-read guide balances conceptual discussions with practical examples to show you how to implement all of Solr's core capabilities. You'll master topics like text analysis, faceted search, hit highlighting, result grouping, query suggestions, multilingual search, advanced geospatial and data operations, and relevancy tuning.

What's Inside

How to scale Solr for big data
Rich real-world examples
Solr as a NoSQL data store
Advanced multilingual, data, and relevancy tricks
Coverage of versions through Solr 4.7

About the Reader

This book assumes basic knowledge of Java and standard database technology. No prior knowledge of Solr or Lucene is required.

About the Authors

Trey Grainger is a director of engineering at CareerBuilder. Timothy Potter is a senior member of the engineering team at LucidWorks. The authors work on the scalability and reliability of Solr, as well as on recommendation engine and big data analytics technologies.

Quotes
The knowledge and techniques you need.
- From the Foreword by Yonik Seeley, Creator of Solr

Readable and immediately applicable ... an excellent book.
- John Viviano, InterCorp, Inc.

The go-to guide for Solr ... a definitive resource for both beginners and experts.
- Scott Anthony, Business Instruments

A well-dosed combination of deep technical knowledge and real-world experience.
- Alexandre Madurell, Piksel, Inc.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781617291029Publisher Support Publisher Website

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills