Mastering Apache Solr 7.x

Book description

Accelerate your enterprise search engine and bring relevancy in your search analytics

About This Book

  • A practical guide in building expertise with Indexing, Faceting, Clustering and Pagination
  • Master the management and administration of Enterprise Search Applications and services seamlessly
  • Handle multiple data inputs such as JSON, xml, pdf, doc, xls,ppt, csv and much more.

Who This Book Is For

The book would rightly appeal to developers, software engineers, data engineers and database architects who are building or seeking to build enterprise-wide effective search engines for business intelligence. Prior experience of Apache Solr or Java programming is must to take the best of this book.

What You Will Learn

  • Design schema using schema API to access data in the database
  • Advance querying and fine-tuning techniques for better performance
  • Get to grips with indexing using Client API
  • Set up a fault tolerant and highly available server with newer distributed capabilities, SolrCloud
  • Explore Apache Tika to upload data with Solr Cell
  • Understand different data operations that can be done while indexing
  • Master advanced querying through Velocity Search UI, faceting and Query Re-ranking, pagination and spatial search
  • Learn to use JavaScript, Python, SolrJ and Ruby for interacting with Solr

In Detail

Apache Solr is the only standalone enterprise search server with a REST-like application interface. providing highly scalable, distributed search and index replication for many of the world's largest internet sites.

To begin with, you would be introduced to how you perform full text search, multiple filter search, perform dynamic clustering and so on helping you to brush up the basics of Apache Solr. You will also explore the new features and advanced options released in Apache Solr 7.x which will get you numerous performance aspects and making data investigation simpler, easier and powerful. You will learn to build complex queries, extensive filters and how are they compiled in your system to bring relevance in your search tools. You will learn to carry out Solr scoring, elements affecting the document score and how you can optimize or tune the score for the application at hand. You will learn to extract features of documents, writing complex queries in re-ranking the documents. You will also learn advanced options helping you to know what content is indexed and how the extracted content is indexed. Throughout the book, you would go through complex problems with solutions along with varied approaches to tackle your business needs.

By the end of this book, you will gain advanced proficiency to build out-of-box smart search solutions for your enterprise demands.

Style and approach

An advance guide which will take you through complex problems with solutions along with varied approaches to tackle your business needs by using Apache solr 7.x

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Mastering Apache Solr 7.x
  3. Packt Upsell
    1. Why subscribe?
    2. PacktPub.com
  4. Contributors
    1. About the authors
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. Introduction to Solr 7
    1. Introduction to Solr
      1. History of Solr
      2. Lucene – the backbone of Solr
    2. Why choose Solr?
      1. Benefits of keyword search
      2. Benefits of ranked results
    3. Solr use cases
      1. Social media
      2. Science and research
      3. Search engine
      4. E-commerce
      5. Media and entertainment
      6. Government
      7. Education
    4. What's new in Solr 7?
      1. Replication for SolrCloud
        1. TLOG replicas
        2. PULL replicas
      2. Schemaless improvements
      3. Autoscaling
      4. Default numeric types
      5. Spatial fields
      6. SolrJ
      7. JMX and MBeans
      8. Other changes
    5. Summary
  7. Getting Started
    1. Solr installation
    2. Understanding various files and the folder structure
      1. bin
      2. Solr script
      3. Post script
      4. contrib
        1. DataImportHandler
        2. ContentExtractionLibrary
        3. LanguageIdentifier
        4. Clustering
        5. VelocityIntegration
      5. dist and docs
      6. example
      7. core.properties
      8. zoo.cfg
      9. solr.xml
      10. server
    3. Running Solr
      1. Running basic Solr commands
      2. Production Solr setup
    4. Loading sample data
      1. Loading data from MySQL
    5. Understanding the browse interface
    6. Using the Solr admin interface
      1. Dashboard
      2. Logging
      3. Cloud screens
        1. Tree view
        2. Graph view
      4. Collections or core admin
      5. Java properties
      6. Thread dump
      7. Collection-specific tools
        1. Overview
        2. Analysis
      8. DataImport
      9. Documents
      10. Files
      11. Query
      12. Stream
      13. Schema
      14. Core-specific tools
    7. Summary
  8. Designing Schemas
    1. How Solr works
      1. Getting started with Solr's basics
      2. The schema file of Solr
    2. Understanding field types
      1. Definitions and properties of field types
        1. Field type properties
      2. Field types available in Solr
      3. Understanding date fields
      4. Understanding currencies and exchange rates 
      5. Understanding enum fields
    3. Field management
      1. Field properties
      2. Copying fields
      3. Dynamic fields
    4. Mastering Schema API
      1. Schema API in detail
        1. Schema operations
        2. Listing fields, field types, DynamicFields, and CopyField rules
    5. Deciphering schemaless mode
      1. Creating a schemaless example
      2. Schemaless mode configuration
      3. Managed schema
      4. Field guessing
    6. Summary
  9. Mastering Text Analysis Methodologies
    1. Understanding text analysis
      1. What is text analysis?
      2. How text analysis works
    2. Understanding analyzer
      1. What is an analyzer?
      2. How an analyzer works
    3. Understanding tokenizers
      1. What is a tokenizer?
      2. Available tokenizers in Solr
        1. Standard tokenizer
        2. White space tokenizer
        3. Classic tokenizer
        4. Keyword tokenizer
        5. Lower case tokenizer
        6. Letter tokenizer
        7. N-gram tokenizer
        8. Edge n-gram tokenizer
    4. Understanding filters
      1. What is a filter?
      2. Available filters in Solr
        1. Stop filter
        2. Classic filter
        3. Synonym filter
        4. Synonym graph filter
        5. ASCII folding filter
        6. Keep word filter
        7. KStem filter
        8. KeywordMarkerFilterFactory
        9. Word delimiter graph filter 
      3. Understanding CharFilter
        1. Understanding PatternReplaceCharFilterFactor
    5. Understanding multilingual analysis
      1. Language identification
      2. Configuring Solr for multiple language search
        1. Creating separate fields per language
        2. Creating separate indexes per language
    6. Understanding phonetic matching
      1. Understanding Beider-Morse phonetic matching
    7. Summary
  10. Data Indexing and Operations
    1. Basics of Solr indexing
      1. Installing Postman
      2. Exploring the post tool
    2. Understanding index handlers
      1. Working with an index handler with the XML format
      2. Index handler with JSON
    3. Apache Tika and indexing
      1. Solr Cell basics
      2. Indexing a binary using Tika
    4. Language detection 
      1. Language detection configuration
    5. Client APIs 
    6. Summary
  11. Advanced Queries – Part I
    1. Search relevance
    2. Velocity search UI
    3. Query parsing and syntax
      1. Common query parameters
      2. Standard query parser
        1. Advantage
        2. Disadvantage
          1. Searching terms for standard query parser
        3. Term modifiers
        4. Wildcard searches
        5. Fuzzy searches
        6. Proximity searching 
        7. Range searches
        8. Boolean operators
        9. Escaping special characters
        10. Grouping terms
        11. Dates and times in query strings
        12. Adding comments to the query string
      3. The DisMax Query Parser
        1. Advantages
        2. DisMax query parser parameters
      4. eDisMax Query Parser
    4. Response writer
      1. JSON
      2. Standard XML
      3. CSV
      4. Velocity
    5. Faceting
      1. Common parameters
      2. Field-value faceting parameters
      3. Range faceting
      4. Pivot faceting
      5. Interval faceting
    6. Highlighting
      1. Highlighting parameters
      2. Highlighter
        1. Unified highlighter (hl.method=unified)
        2. Original highlighter (hl.method=original) 
        3. FastVector highlighter (hl.method=fastVector)
      3. Boundary scanners
        1. The breakIterator boundary scanner
        2. The simple boundary scanner
    7. Summary
  12. Advanced Queries – Part II
    1. Spellchecking
      1. Spellcheck parameters
      2. Implementation approaches
        1. IndexBasedSpellChecker
        2. DirectSolrSpellChecker
        3. FileBasedSpellChecker
        4. WordBreakSolrSpellChecker
      3. Distributed spellcheck
    2. Suggester
      1. Suggester parameters
      2. Running suggestions
    3. Pagination
      1. How to implement pagination
      2. Cursor pagination
    4. Result grouping
      1. Result grouping parameters
      2. Running result grouping
    5. Result clustering
      1. Result clustering parameters
      2. Result clustering implementation
        1. Install the clustering contrib
        2. Declare the cluster search component
        3. Declare the request handler and include the cluster search component
    6. Spatial search
      1. Spatial search implementation
        1. Field types
        2. Query parser
        3. Spatial search query parser parameters
      2. Function queries
    7. Summary
  13. Managing and Fine-Tuning Solr
    1. JVM configuration
      1. Managing the memory heap 
    2. Managing solrconfig.xml
      1. User-defined properties
        1. Implicit Solr core properties
    3. Managing backups
      1. Backup in SolrCloud
      2. Standalone mode backups
        1. Backup API
        2. Backup status
        3. API to restore
        4. Restore status API
        5. Snapshot API
    4. JMX with Solr
      1. JMX configuration
    5. Logging configuration
      1. Log settings using the admin web interface
      2. Log level at startup
        1. Setting the environment variable
        2. Passing parameters in the startup script
      3. Configuring Log4J for logging
    6. SolrCloud overview
      1. SolrCloud in interactive mode
      2. SolrCloud – core concepts
      3. Routing documents
      4. Splitting shards
      5. Setting up ignore commits from client applications
    7. Enabling SSL – Solr security
      1. Prerequisites
      2. Generating a key and self-signed certificate
      3. Starting Solr with SSL system properties
    8. Performance statistics
      1. Statistics for request handlers
    9. Summary
  14. Client APIs – An Overview
    1. Client API overview
    2. JavaScript Client API
    3. SolrJ Client API
    4. Ruby Client API
    5. Python Client API
    6. Summary

Product information

  • Title: Mastering Apache Solr 7.x
  • Author(s): Sandeep Nair, Chintan Mehta, Dharmesh Vasoya
  • Release date: February 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781788837385