Solr in Action

Book description

Solr in Action is a comprehensive guide to implementing scalable search using Apache Solr. This clearly written book walks you through well-documented examples ranging from basic keyword searching to scaling a system for billions of documents and queries. It will give you a deep understanding of how to implement core Solr capabilities.



About the Technology


About the Book

Whether you're handling big (or small) data, managing documents, or building a website, it is important to be able to quickly search through your content and discover meaning in it. Apache Solr is your tool: a ready-to-deploy, Lucene-based, open source, full-text search engine. Solr can scale across many servers to enable real-time queries and data analytics across billions of documents.

Solr in Action teaches you to implement scalable search using Apache Solr. This easy-to-read guide balances conceptual discussions with practical examples to show you how to implement all of Solr's core capabilities. You'll master topics like text analysis, faceted search, hit highlighting, result grouping, query suggestions, multilingual search, advanced geospatial and data operations, and relevancy tuning.



What's Inside
  • How to scale Solr for big data
  • Rich real-world examples
  • Solr as a NoSQL data store
  • Advanced multilingual, data, and relevancy tricks
  • Coverage of versions through Solr 4.7


About the Reader

This book assumes basic knowledge of Java and standard database technology. No prior knowledge of Solr or Lucene is required.



About the Authors

Trey Grainger is a director of engineering at CareerBuilder. Timothy Potter is a senior member of the engineering team at LucidWorks. The authors work on the scalability and reliability of Solr, as well as on recommendation engine and big data analytics technologies.



Quotes
The knowledge and techniques you need.
- From the Foreword by Yonik Seeley, Creator of Solr

Readable and immediately applicable ... an excellent book.
- John Viviano, InterCorp, Inc.

The go-to guide for Solr ... a definitive resource for both beginners and experts.
- Scott Anthony, Business Instruments

A well-dosed combination of deep technical knowledge and real-world experience.
- Alexandre Madurell, Piksel, Inc.

Table of contents

  1. Copyright
  2. Brief Table of Contents
  3. Table of Contents
  4. Foreword
  5. Preface
  6. Acknowledgments
  7. About this Book
  8. Part 1. Meet Solr
    1. Chapter 1. Introduction to Solr
      1. 1.1. Why do I need a search engine?
      2. 1.2. What is Solr?
      3. 1.3. Why Solr?
      4. 1.4. Features overview
      5. 1.5. Summary
    2. Chapter 2. Getting to know Solr
      1. 2.1. Getting started
      2. 2.2. Searching is what it’s all about
      3. 2.3. Tour of the Solr administration console
      4. 2.4. Adapting the example to your needs
      5. 2.5. Summary
    3. Chapter 3. Key Solr concepts
      1. 3.1. Searching, matching, and finding content
      2. 3.2. Relevancy
      3. 3.3. Precision and Recall
      4. 3.4. Searching at scale
      5. 3.5. Summary
    4. Chapter 4. Configuring Solr
      1. 4.1. Overview of solrconfig.xml
      2. 4.2. Query request handling
      3. 4.3. Managing searchers
      4. 4.4. Cache management
      5. 4.5. Remaining configuration options
      6. 4.6. Summary
    5. Chapter 5. Indexing
      1. 5.1. Example microblog search application
      2. 5.2. Designing your schema
      3. 5.3. Defining fields in schema.xml
      4. 5.4. Field types for structured nontext fields
      5. 5.5. Sending documents to Solr for indexing
      6. 5.6. Update handler
      7. 5.7. Index management
      8. 5.8. Summary
    6. Chapter 6. Text analysis
      1. 6.1. Analyzing microblog text
      2. 6.2. Basic text analysis
      3. 6.3. Defining a custom field type for microblog text
      4. 6.4. Advanced text analysis
      5. 6.5. Summary
  9. Part 2. Core Solr capabilities
    1. Chapter 7. Performing queries and handling results
      1. 7.1. The anatomy of a Solr request
      2. 7.2. Working with query parsers
      3. 7.3. Queries and filters
      4. 7.4. The default query parser (Lucene query parser)
      5. 7.5. Handling user queries (eDisMax query parser)
      6. 7.6. Other useful query parsers
      7. 7.7. Returning results
      8. 7.8. Sorting results
      9. 7.9. Debugging query results
      10. 7.10. Summary
    2. Chapter 8. Faceted search
      1. 8.1. Navigating your content at a glance
      2. 8.2. Setting up test data
      3. 8.3. Field faceting
      4. 8.4. Query faceting
      5. 8.5. Range faceting
      6. 8.6. Filtering upon faceted values
      7. 8.7. Multiselect faceting, keys, and tags
      8. 8.8. Beyond the basics
      9. 8.9. Summary
    3. Chapter 9. Hit highlighting
      1. 9.1. Overview of hit highlighting
      2. 9.2. How highlighting works
      3. 9.3. Improving performance using FastVectorHighlighter
      4. 9.4. PostingsHighlighter
      5. 9.5. Summary
    4. Chapter 10. Query suggestions
      1. 10.1. Spell-check
      2. 10.2. Autosuggesting query terms
      3. 10.3. Suggesting document field values
      4. 10.4. Suggesting queries based on user activity
      5. 10.5. Summary
    5. Chapter 11. Result grouping/field collapsing
      1. 11.1. Result grouping vs. field collapsing
      2. 11.2. Skipping duplicate documents
      3. 11.3. Returning multiple documents per group
      4. 11.4. Grouping by functions and queries
      5. 11.5. Paging and sorting grouped results
      6. 11.6. Grouping gotchas
      7. 11.7. Efficient field collapsing with the Collapsing query parser
      8. 11.8. Summary
    6. Chapter 12. Taking Solr to production
      1. 12.1. Developing a Solr distribution
      2. 12.2. Deploying Solr
      3. 12.3. Hardware and server configuration
      4. 12.4. Data acquisition strategies
      5. 12.5. Sharding and replication
      6. 12.6. Solr core management
      7. 12.7. Managing clusters of servers
      8. 12.8. Querying and interacting with Solr
      9. 12.9. Monitoring Solr’s performance
      10. 12.10. Upgrading between Solr versions
      11. 12.11. Summary
  10. Part 3. Taking Solr to the next level
    1. Chapter 13. SolrCloud
      1. 13.1. Getting started with SolrCloud
      2. 13.2. Core concepts
      3. 13.3. Distributed indexing
      4. 13.4. Distributed search
      5. 13.5. Collections API
      6. 13.6. Basic system-administration tasks
      7. 13.7. Advanced topics
      8. 13.8. Summary
    2. Chapter 14. Multilingual search
      1. 14.1. Why linguistic analysis matters
      2. 14.2. Stemming vs. lemmatization
      3. 14.3. Stemming in action
      4. 14.4. Handling edge cases
      5. 14.5. Available language libraries in Solr
      6. 14.6. Searching content in multiple languages
      7. 14.7. Language identification
      8. 14.8. Summary
    3. Chapter 15. Complex query operations
      1. 15.1. Function queries
      2. 15.2. Geospatial search
      3. 15.3. Pivot faceting
      4. 15.4. Referencing external data
      5. 15.5. Cross-document and cross-index joins
      6. 15.6. Big data analytics with Solr
      7. 15.7. Summary
    4. Chapter 16. Mastering relevancy
      1. 16.1. The impact of relevancy tuning
      2. 16.2. Debugging the relevancy calculation
      3. 16.3. Relevancy boosting
      4. 16.4. Pluggable Similarity class implementations
      5. 16.5. Personalized search and recommendations
      6. 16.6. Creating a personalized search experience
      7. 16.7. Running relevancy experiments
      8. 16.8. Summary
      9. A.1. Pulling the right version of Solr
      10. A.2. Setting up Solr in your IDE
      11. A.3. Debugging Solr code
      12. A.4. Downloading and applying Solr patches
      13. A.5. Contributing patches
      14. C.1. Indexing Wikipedia
      15. C.2. Indexing Stack Exchange
  11. Appendix A. Working with the Solr codebase
  12. Appendix B. Language-specific field type configurations
  13. Appendix C. Useful data import configurations
  14. Index
  15. List of Figures
  16. List of Tables
  17. List of Listings

Product information

  • Title: Solr in Action
  • Author(s): Trey Grainger, Timothy Potter
  • Release date: March 2014
  • Publisher(s): Manning Publications
  • ISBN: 9781617291029