O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Learning Elasticsearch

Book Description

Store, search, and analyze your data with ease using Elasticsearch 5.x

About This Book

  • Get to grips with the basics of Elasticsearch concepts and its APIs, and use them to create efficient applications
  • Create large-scale Elasticsearch clusters and perform analytics using aggregation
  • This comprehensive guide will get you up and running with Elasticsearch 5.x in no time

Who This Book Is For

If you want to build efficient search and analytics applications using Elasticsearch, this book is for you. It will also benefit developers who have worked with Lucene or Solr before and now want to work with Elasticsearch. No previous knowledge of Elasticsearch is expected.

What You Will Learn

  • See how to set up and configure Elasticsearch and Kibana
  • Know how to ingest structured and unstructured data using Elasticsearch
  • Understand how a search engine works and the concepts of relevance and scoring
  • Find out how to query Elasticsearch with a high degree of performance and scalability
  • Improve the user experience by using autocomplete, geolocation queries, and much more
  • See how to slice and dice your data using Elasticsearch aggregations.
  • Grasp how to use Kibana to explore and visualize your data
  • Know how to host on Elastic Cloud and how to use the latest X-Pack features such as Graph and Alerting

In Detail

Elasticsearch is a modern, fast, distributed, scalable, fault tolerant, and open source search and analytics engine. You can use Elasticsearch for small or large applications with billions of documents. It is built to scale horizontally and can handle both structured and unstructured data. Packed with easy-to- follow examples, this book will ensure you will have a firm understanding of the basics of Elasticsearch and know how to utilize its capabilities efficiently.

You will install and set up Elasticsearch and Kibana, and handle documents using the Distributed Document Store. You will see how to query, search, and index your data, and perform aggregation-based analytics with ease. You will see how to use Kibana to explore and visualize your data.

Further on, you will learn to handle document relationships, work with geospatial data, and much more, with this easy-to-follow guide. Finally, you will see how you can set up and scale your Elasticsearch clusters in production environments.

Style and approach

This comprehensive guide will get you started with Elasticsearch 5.x, so you build a solid understanding of the basics. Every topic is explained in depth and is supplemented with practical examples to enhance your understanding.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Errata
      3. Piracy
      4. Questions
  2. Introduction to Elasticsearch
    1. Basic concepts of Elasticsearch
      1. Document
      2. Index
      3. Type
      4. Cluster and node
      5. Shard
    2. Interacting with Elasticsearch
      1. Creating a document
      2. Retrieving an existing document
      3. Updating an existing document
        1. Updating a partial document
      4. Deleting an existing document
    3. How does search work?
      1. Importance of information retrieval
      2. Simple search query
      3. Inverted index
        1. Stemming
        2. Synonyms
        3. Phrase search
      4. Apache Lucene
    4. Scalability and availability
      1. Relation between node, index, and shard
        1. Three shards with zero replicas
        2. Six shards with zero replicas
        3. Six shards with one replica
      2. Distributed search
      3. Failure handling
      4. Strengths and limitations of Elasticsearch
    5. Summary
  3. Setting Up Elasticsearch and Kibana
    1. Installing Elasticsearch
      1. Installing Java
      2. Windows
        1. Starting and stopping Elasticsearch
      3. Mac OS X
        1. Starting and stopping Elasticsearch
      4. DEB and RPM packages
        1. Debian package
        2. RPM package
        3. Starting and stopping Elasticsearch
      5. Sample configuration files
      6. Verifying Elasticsearch is running
    2. Installing Kibana
      1. Mac OS X
        1. Starting and stopping Kibana
      2. Windows
        1. Starting and stopping Kibana
    3. Query format used in this book (Kibana Console)
    4. Using cURL or Postman
    5. Health of the cluster
    6. Summary
  4. Modeling Your Data and Document Relations
    1. Mapping
      1. Dynamic mapping
      2. Create index with mapping
      3. Adding a new type/field
      4. Getting the existing mapping
      5. Mapping conflicts
      6. Data type
      7. Metafields
      8. How to handle null values
      9. Storing the original document
      10. Searching all the fields in the document
    2. Difference between full-text search and exact match
    3. Core data types
      1. Text
      2. Keyword
      3. Date
      4. Numeric
      5. Boolean
      6. Binary
    4. Complex data types
      1. Array
      2. Object
      3. Nested
    5. Geo data type
      1. Geo-point data type
    6. Specialized data type
      1. IP
    7. Mapping the same field with different mappings
    8. Handling relations between different document types
      1. Parent-child document relation
        1. How are parent-child documents stored internally?
      2. Nested
    9. Routing
    10. Summary
  5. Indexing and Updating Your Data
    1. Indexing your data
      1. Indexing errors
        1. Node/shards errors
        2. Serialization/mapping errors
        3. Thread pool rejection error
      2. Managing an index
      3. What happens when you index a document?
    2. Updating your data
      1. Update using an entire document
      2. Partial updates
      3. Scripted updates
      4. Upsert
      5. NOOP
      6. What happens when you update a document?
        1. Merging segments
    3. Using Kibana to discover
    4. Using Elasticsearch in your application
      1. Java
        1. Transport client
          1. Dependencies
          2. Initializing the client
          3. Sniffing
        2. Node client
        3. REST client
        4. Third party clients
        5. Indexing using Java client
    5. Concurrency
    6. Translog
      1. Async versus sync
      2. CRUD from translog
    7. Primary and Replica shards
      1. Primary preference
      2. More replicas for query throughput
      3. Increasing/decreasing the number of replicas
    8. Summary
  6. Organizing Your Data and Bulk Data Ingestion
    1. Bulk operations
      1. Bulk API
      2. Multi Get API
      3. Update by query
      4. Delete by query
    2. Reindex API
      1. Change mappings/settings
      2. Combining documents from one or more indices
      3. Copying only missing documents
      4. Copying a subset of documents into a new index
      5. Copying top N documents
      6. Copying the subset of fields into new index
    3. Ingest Node
    4. Organizing your data
      1. Index alias
      2. Index templates
      3. Managing time-based indices
    5. Shrink API
    6. Summary
  7. All About Search
    1. Different types of queries
    2. Sample data
    3. Querying Elasticsearch
      1. Basic query (finding the exact value)
      2. Pagination
      3. Sorting based on existing fields
      4. Selecting the fields in the response
      5. Querying based on range
      6. Handling dates
      7. Analyzed versus non-analyzed fields
      8. Term versus Match query
      9. Match phrase query
      10. Prefix and match phrase prefix query
      11. Wildcard and Regular expression query
      12. Exists and missing queries
      13. Using more than one query
      14. Routing
      15. Debugging search query
    4. Relevance
      1. Queries versus Filters
      2. How to boost relevance based on a single field
      3. How to boost score based on queries
      4. How to boost relevance using decay functions
      5. Rescoring
      6. Debugging relevance score
    5. Searching for same value across multiple fields
      1. Best matching fields
      2. Most matching fields
      3. Cross-matching fields
    6. Caching
      1. Node Query cache
      2. Shard request cache
    7. Summary
  8. More Than a Search Engine (Geofilters, Autocomplete, and More)
    1. Sample data
    2. Correcting typos and spelling mistakes
      1. Fuzzy query
    3. Making suggestions based on the user input
      1. Implementing "did you mean" feature
        1. Term suggester
        2. Phrase suggester
      2. Implementing the autocomplete feature
    4. Highlighting
    5. Handling document relations using parent-child
      1. The has_parent query
      2. The has_child query
      3. Inner hits for parent-child
      4. How parent-child works internally
    6. Handling document relations using nested
      1. Inner hits for nested documents
    7. Scripting
      1. Script Query
    8. Post Filter
    9. Reverse search using the percolate query
    10. Geo and Spatial Filtering
      1. Geo Distance
        1. Using Geolocation to rank the search results
      2. Geo Bounding Box
      3. Sorting
    11. Multi search
    12. Search templates
    13. Querying Elasticsearch from Java application
    14. Summary
  9. How to Slice and Dice Your Data Using Aggregations
    1. Aggregation basics
      1. Sample data
      2. Query structure
      3. Multilevel aggregations
    2. Types of aggregations
      1. Terms aggregations (group by)
        1. Size and error
        2. Order
        3. Minimum document count
        4. Missing values
      2. Aggregations based on filters
      3. Aggregations on dates ( range, histogram )
      4. Aggregations on numeric values (range, histogram)
      5. Aggregations on geolocation (distance, bounds)
        1. Geo distance
        2. Geo bounds
      6. Aggregations on child documents
      7. Aggregations on nested documents
        1. Reverse nested aggregation
      8. Post filter
    3. Using Kibana to visualize aggregations
    4. Caching
    5. Doc values
    6. Field data
    7. Summary
  10. Production and Beyond
    1. Configuring Elasticsearch
      1. The directory structure
        1. zip/tar.gz
        2. DEB/RPM
      2. Configuration file
      3. Cluster and node name
      4. Network configuration
      5. Memory configuration
      6. Configuring file descriptors
      7. Types of nodes
    2. Multinode cluster
      1. Inspecting the logs
    3. How nodes discover each other
      1. Node failures
    4. X-Pack
      1. Windows
      2. Mac OS X
      3. Debian/RPM
      4. Authentication
      5. X-Pack basic license
    5. Monitoring
      1. Monitoring Elasticsearch clusters
      2. Monitoring indices
      3. Monitoring nodes
    6. Thread pools
    7. Elasticsearch server logs
      1. Slow logs
    8. Summary
  11. Exploring Elastic Stack (Elastic Cloud, Security, Graph, and Alerting)
    1. Elastic Cloud
      1. High availability
      2. Data reliability
    2. Security
      1. Authentication and roles
      2. Securing communications using SSL
    3. Graph
      1. Graph UI
    4. Alerting
    5. Summary