O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Elasticsearch Indexing

Book Description

Improve search experiences with ElasticSearch’s powerful indexing functionality – learn how with this practical ElasticSearch tutorial, packed with tips!

About This Book

  • Improve user’s search experience with the correct configuration
  • Deliver relevant search results – fast!
  • Save time and system resources by creating stable clusters

Who This Book Is For

If you understand the importance of a great search experience this book will show you exactly how to build one with ElasticSearch, one of the world’s leading search servers.

What You Will Learn

  • Learn how ElasticSearch efficiently stores data – and find out how it can reduce costs
  • Control document metadata with the correct mapping strategies and by configuring indices
  • Use ElasticSearch analysis and analyzers to incorporate greater intelligence and organization across your documents and data
  • Find out how an ElasticSearch cluster works – and learn the best way to configure it
  • Perform high-speed indexing with low system resource cost
  • Improve query relevance with appropriate mapping, suggest API, and other ElasticSearch functionalities

In Detail

Beginning with an overview of the way ElasticSearch stores data, you’ll begin to extend your knowledge to tackle indexing and mapping, and learn how to configure ElasticSearch to meet your users’ needs. You’ll then find out how to use analysis and analyzers for greater intelligence in how you organize and pull up search results – to guarantee that every search query is met with the relevant results! You’ll explore the anatomy of an ElasticSearch cluster, and learn how to set up configurations that give you optimum availability as well as scalability. Once you’ve learned how these elements work, you’ll find real-world solutions to help you improve indexing performance, as well as tips and guidance on safety so you can back up and restore data. Once you’ve learned each component outlined throughout, you will be confident that you can help to deliver an improved search experience – exactly what modern users demand and expect.

Style and approach

This is a comprehensive guide to performing efficient indexing and providing relevant search results using mapping, analyzers, and other ElasticSearch functionalities.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Elasticsearch Indexing
    1. Table of Contents
    2. Elasticsearch Indexing
    3. Credits
    4. About the Author
    5. About the Reviewer
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    8. 1. Introduction to Efficient Indexing
      1. Getting started
      2. Understanding the document storage strategy
        1. The _source field
        2. The difference between the storable and searchable field
      3. Analysis
      4. Summary
    9. 2. What is an Elasticsearch Index
      1. Nature of the Elasticsearch index
        1. Indices
        2. Mapping
        3. Types
      2. Document
        1. Denormalization
        2. Inverted index
      3. Summary
    10. 3. Basic Concepts of Mapping
      1. Basic concepts and definitions
        1. Metadata fields
          1. _source
          2. _all
          3. _timestamp
          4. _ttl
      2. Types
        1. Object type
          1. Root object type
        2. Attachment type
      3. The relationship between mapping and relevant search results
      4. Understanding the schema-less
      5. Summary
    11. 4. Analysis and Analyzers
      1. Introducing analysis
      2. Process of analysis
      3. Built-in analyzers
        1. Building blocks of Analyzer
        2. Characte filters
          1. HTML Strip Char filter
          2. Pattern Replace Char filter
        3. Tokenizer
        4. Token filters
      4. What's text normalization?
      5. ICU analysis plugin
        1. ASCII Foldng Token filter
      6. An Analyzer Pipeline
      7. Specifying the analyzer for a field in the mapping
        1. Creating a custom analyzer
      8. Summary
    12. 5. Anatomy of an Elasticsearch Cluster
      1. Basic concepts
      2. Node
        1. Non-data nodes
          1. Dedicated master nodes
          2. Client nodes
        2. Tribe node
      3. Shards
      4. Replicas
      5. Explaining the architecture of distribution
      6. Correctly configuring the cluster
      7. Choosing the right amount of shards and replicas
      8. Summary
    13. 6. Improving Indexing Performance
      1. Configuration
        1. Memory configuration
          1. The ES_HEAP_SIZE environment variable
        2. Avoiding swapping
          1. Mlockall property
        3. Garbage collector
        4. The structure of JVM memory
          1. What is the problem?
          2. Monitoring garbage collection
          3. VisualVM
          4. Different strategies among garbage collectors
          5. Process of deallocating memory
          6. Types of garbage collector
            1. Serial garbage collector
            2. Parallel garbage collector
            3. Concurrent Mark Sweep garbage collector
            4. G1 garbage collector
            5. Tuning the garbage collection
        5. File descriptors
          1. Increasing FD limit on Unix systems
      2. Optimization of mapping definition
        1. Norms
        2. Feature index_option of string type
        3. Exclude unnecessary fields
        4. Extension of the automatic index refresh time
      3. Segments and merging policies
        1. Choosing the right merge policy
          1. Tiered policy
          2. log_byte_size policy
          3. Log_doc policy
        2. The optimize API
      4. Store module
        1. Store types
          1. Simple filesystem store
          2. New IO filesystem store
          3. MMap filesystem store
          4. Hybrid filesystem store
        2. Throttling I/O operations
          1. Throttling type
      5. Bulk API
        1. Bulk sizing
      6. Notes
      7. Summary
    14. 7. Snapshot and Restore
      1. Snapshot repository
        1. Repository types
          1. Shared filesystem repository
          2. URL repository
          3. Cloud repository
          4. HDFS filesystem repository
      2. Snapshot
      3. Restore
        1. Overriding index settings during restore
      4. How does the snapshot process works?
      5. Summary
    15. 8. Improving the User Search Experience
      1. Correction of users' spelling mistakes
        1. Suggesters
        2. Using the _suggest REST endpoint
          1. Suggest object inclusion in the query
        3. Term suggester
          1. Configuring the term suggester
            1. Common suggest options
            2. Other and additional term suggester options
        4. The phrase suggester
          1. Configuring the phrase suggester
        5. The completion suggester
          1. Mapping the configuration for the completion suggester
          2. Indexing on completion field
      2. Get suggestions
      3. Improving the relevancy of search results
        1. Boosting the query
        2. Bool query
        3. Synonyms
        4. Be careful about the _all field
      4. Summary
    16. Index