Advanced Elasticsearch 7.0

Book description

Master the intricacies of Elasticsearch 7.0 and use it to create flexible and scalable search solutions

Key Features

  • Master the latest distributed search and analytics capabilities of Elasticsearch 7.0
  • Perform searching, indexing, and aggregation of your data at scale
  • Discover tips and techniques for speeding up your search query performance

Book Description

Building enterprise-grade distributed applications and executing systematic search operations call for a strong understanding of Elasticsearch and expertise in using its core APIs and latest features. This book will help you master the advanced functionalities of Elasticsearch and understand how you can develop a sophisticated, real-time search engine confidently. In addition to this, you'll also learn to run machine learning jobs in Elasticsearch to speed up routine tasks.

You'll get started by learning to use Elasticsearch features on Hadoop and Spark and make search results faster, thereby improving the speed of query results and enhancing the customer experience. You'll then get up to speed with performing analytics by building a metrics pipeline, defining queries, and using Kibana for intuitive visualizations that help provide decision-makers with better insights. The book will later guide you through using Logstash with examples to collect, parse, and enrich logs before indexing them in Elasticsearch.

By the end of this book, you will have comprehensive knowledge of advanced topics such as Apache Spark support, machine learning using Elasticsearch and scikit-learn, and real-time analytics, along with the expertise you need to increase business productivity, perform analytics, and get the very best out of Elasticsearch.

What you will learn

  • Pre-process documents before indexing in ingest pipelines
  • Learn how to model your data in the real world
  • Get to grips with using Elasticsearch for exploratory data analysis
  • Understand how to build analytics and RESTful services
  • Use Kibana, Logstash, and Beats for dashboard applications
  • Get up to speed with Spark and Elasticsearch for real-time analytics
  • Explore the basics of Spring Data Elasticsearch, and understand how to index, search, and query in a Spring application

Who this book is for

This book is for Elasticsearch developers and data engineers who want to take their basic knowledge of Elasticsearch to the next level and use it to build enterprise-grade distributed search applications. Prior experience of working with Elasticsearch will be useful to get the most out of this book.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Advanced Elasticsearch 7.0
  3. Dedication
  4. About Packt
    1. Why subscribe?
  5. Contributors
    1. About the author
    2. About the reviewers
    3. Packt is searching for authors like you
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  7. Section 1: Fundamentals and Core APIs
  8. Overview of Elasticsearch 7
    1. Preparing your environment
    2. Running Elasticsearch
      1. Basic Elasticsearch configuration
      2. Important system configuration
    3. Talking to Elasticsearch
      1. Using Postman to work with the Elasticsearch REST API
    4. Elasticsearch architectural overview
      1. Elastic Stack architecture
      2. Elasticsearch architecture
        1. Between the Elasticsearch index and the Lucene index
    5. Key concepts
      1. Mapping concepts across SQL and Elasticsearch
      2. Mapping
      3. Analyzer
        1. Standard analyzer
    6. API conventions
    7. New features
      1. New features to be discussed
      2. New features with description and issue number
    8. Breaking changes
      1. Aggregations changes
      2. Analysis changes
      3. API changes
      4. Cluster changes
      5. Discovery changes
      6. High-level REST client changes
      7. Low-level REST client changes
      8. Indices changes
      9. Java API changes
      10. Mapping changes
      11. ML changes
      12. Packaging changes
      13. Search changes
      14. Query DSL changes
      15. Settings changes
      16. Scripting changes
    9. Migration between versions
    10. Summary
  9. Index APIs
    1. Index management APIs
      1. Basic CRUD APIs
    2. Index settings
      1. Index templates
    3. Index aliases
      1. Reindexing with zero downtime
      2. Grouping multiple indices
      3. Views on a subset of documents
      4. Miscellaneous
    4. Monitoring indices
      1. Indices stats
      2. Indices segments, recovery, and share stores
    5. Index persistence
    6. Advanced index management APIs
      1. Split index
      2. Shrink index
      3. Rollover index
    7. Summary
  10. Document APIs
    1. The Elasticsearch document life cycle
      1. What is a document?
      2. The document life cycle
    2. Single document management APIs
      1. Sample documents
      2. Indexing a document
      3. Retrieving a document by identifier
      4. Updating a document
      5. Removing a document by identifier
    3. Multi-document management APIs
      1. Retrieving multiple documents
      2. Bulk API
      3. Update by query API
      4. Delete by query API
      5. Reindex API
        1. Copying documents
    4. Migration from a multiple mapping types index
    5. Summary
  11. Mapping APIs
    1. Dynamic mapping
      1. Mapping rules
      2. Dynamic templates
    2. Meta fields in mapping
    3. Field datatypes
      1. Static mapping for the sample document
    4. Mapping parameters
    5. Refreshing mapping changes for static mapping
    6. Typeless APIs working with old custom index types
    7. Summary
  12. Anatomy of an Analyzer
    1. An analyzer's components
    2. Character filters
      1. The html_strip filter
      2. The mapping filter
      3. The pattern_replace filter
    3. Tokenizers
    4. Token filters
    5. Built-in analyzers
    6. Custom analyzers
    7. Normalizers
    8. Summary
  13. Search APIs
    1. Indexing sample documents
    2. Search APIs
      1. URI search
      2. Request body search
        1. The sort parameter
        2. The scroll parameter
        3. The search_after parameter
        4. The rescore parameter
        5. The _name parameter
        6. The collapse parameter
        7. The highlighting parameter
        8. Other search parameters
    3. Query DSL
      1. Full text queries
        1. The match keyword
        2. The query string keyword
        3. The intervals keyword
      2. Term-level queries
      3. Compound queries
      4. The script query
    4. The multi-search API
    5. Other search-related APIs
      1. The _explain API
      2. The _validate API
      3. The _count API
      4. The field capabilities API
      5. Profiler
      6. Suggesters
    6. Summary
  14. Section 2: Data Modeling, Aggregations Framework, Pipeline, and Data Analytics
  15. Modeling Your Data in the Real World
    1. The Investor Exchange Cloud
    2. Modeling data and the approaches
      1. Data denormalization
      2. Using an array of objects datatype
      3. Nested object mapping datatypes
      4. Join datatypes
        1. Parent ID query
        2. has_child query
        3. has_parent query
    3. Practical considerations
    4. Summary
  16. Aggregation Frameworks
    1. ETF historical data preparation
    2. Aggregation query syntax
    3. Matrix aggregations
      1. Matrix stats
    4. Metrics aggregations
      1. avg
      2. weighted_avg
      3. cardinality
      4. value_count
      5. sum
      6. min
      7. max
      8. stats
      9. extended_stats
      10. top_hit
      11. percentiles
      12. percentile_ranks
      13. median_absolute_deviation
      14. geo_bound
      15. geo_centroid
      16. scripted_metric
    5. Bucket aggregations
      1. histogram
      2. date_histogram
      3. auto_date_histogram
      4. ranges
      5. date_range
      6. ip_range
      7. filter
      8. filters
      9. term
      10. significant_terms
      11. significant_text
      12. sampler
      13. diversified_sampler
      14. nested
      15. reverse_nested
      16. global
      17. missing
      18. composite
      19. adjacency_matrix
      20. parent
      21. children
      22. geo_distance
      23. geohash_grid
      24. geotile_grid
    6. Pipeline aggregations
      1. Sibling family
        1. avg_bucket
        2. max_bucket
        3. min_bucket
        4. sum_bucket
        5. stats_bucket
        6. extended_stats_bucket
        7. percentiles_bucket
      2. Parent family
        1. cumulative_sum
        2. derivative
        3. bucket_script
        4. bucket_selector
        5. bucket_sort
        6. serial_diff
        7. Moving average aggregation
          1. simple
          2. linear
          3. ewma
          4. holt
          5. holt_winters
        8. Moving function aggregation
          1. max
          2. min
          3. sum
          4. stdDev
          5. unweightedAvg
          6. linearWeightedAvg
          7. ewma
          8. holt
          9. holtWinters
    7. Post filter on aggregations
    8. Summary
  17. Preprocessing Documents in Ingest Pipelines
    1. Ingest APIs
    2. Accessing data in pipelines
    3. Processors
    4. Conditional execution in pipelines
    5. Handling failures in pipelines
    6. Summary
  18. Using Elasticsearch for Exploratory Data Analysis
    1. Business analytics
    2. Operational data analytics
    3. Sentiment analysis
    4. Summary
  19. Section 3: Programming with the Elasticsearch Client
  20. Elasticsearch from Java Programming
    1. Overview of Elasticsearch Java REST client
    2. The Java low-level REST client
      1. The Java low-level REST client workflow
        1. REST client initialization
        2. Performing requests using a REST client
        3. Handing responses
      2. Testing with Swagger UI
      3. New features
    3. The Java high-level REST client
      1. The Java high-level REST client workflow
        1. REST client initialization
        2. Performing requests using the REST client
        3. Handling responses
      2. Testing with Swagger UI
      3. New features
    4. Spring Data Elasticsearch
    5. Summary
  21. Elasticsearch from Python Programming
    1. Overview of the Elasticsearch Python client
    2. The Python low-level Elasticsearch client
      1. Workflow for the Python low-level Elasticsearch client
        1. Client initialization
        2. Performing requests
        3. Handling responses
    3. The Python high-level Elasticsearch library
      1. Illustrating the programming concept
        1. Initializing a connection
        2. Performing requests
        3. Handling responses
      2. The query class
      3. The aggregations class
    4. Summary
  22. Section 4: Elastic Stack
  23. Using Kibana, Logstash, and Beats
    1. Overview of the Elastic Stack
      1. Running the Elastic Stack with Docker
    2. Running Elasticsearch in a Docker container
    3. Running Kibana in a Docker container
    4. Running Logstash in a Docker container
    5. Running Beats in a Docker container
    6. Summary
  24. Working with Elasticsearch SQL
    1. Overview
    2. Getting started
    3. Elasticsearch SQL language
      1. Reserved keywords
      2. Data type
      3. Operators
      4. Functions
        1. Aggregate
        2. Grouping
        3. Date-time
        4. Full-text search
        5. Mathematics
        6. String
        7. Type conversion
        8. Conditional
        9. System
      5. Elasticsearch SQL query syntax
      6. New features
    4. Elasticsearch SQL REST API
    5. Elasticsearch SQL JDBC
      1. Upgrading Elasticsearch from a basic to a trial license
      2. Workflow of Elasticsearch SQL JDBC
      3. Testing with Swagger UI
    6. Summary
  25. Working with Elasticsearch Analysis Plugins
    1. What are Elasticsearch plugins?
      1. Plugin management
    2. Working with the ICU Analysis plugin
      1. Examples
    3. Working with the Smart Chinese Analysis plugin
      1. Examples
    4. Working with the IK Analysis plugin
      1. Examples
      2. Configuring a custom dictionary in the IK Analysis plugin
    5. Summary
  26. Section 5: Advanced Features
  27. Machine Learning with Elasticsearch
    1. Machine learning with Elastic Stack
      1. Machine learning APIs
      2. Machine learning jobs
        1. Sample data
        2. Running a single-metric job
          1. Creating index patterns
          2. Creating a new machine learning job
          3. Examining the result
    2. Machine learning using Elasticsearch and scikit-learn
    3. Summary
  28. Spark and Elasticsearch for Real-Time Analytics
    1. Overview of ES-Hadoop
    2. Apache Spark support
    3. Real-time analytics using Elasticsearch and Apache Spark
      1. Building a virtual environment to run the sample ES-Hadoop project
      2. Running the sample ES-Hadoop project
      3. Running the sample ES-Hadoop project using a prepared Docker image
      4. Source code
    4. Summary
  29. Building Analytics RESTful Services
    1. Building a RESTful web service with Spring Boot
      1. Project program structure
      2. Running the program and examining the APIs
      3. Main workflow anatomy
        1. Building the analytic model
        2. Performing daily update data
        3. Getting the registered symbols
        4. Building the scheduler
    2. Integration with the Bollinger Band
    3. Building a Java Spark ML module for k-means anomaly detection
      1. Source code
    4. Testing Analytics RESTful services
      1. Testing the build-analytics-model API
      2. Testing the get-register-symbols API
    5. Working with Kibana to visualize the analytics results
    6. Summary
  30. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Advanced Elasticsearch 7.0
  • Author(s): Wai Tak Wong
  • Release date: August 2019
  • Publisher(s): Packt Publishing
  • ISBN: 9781789957754