Learning Elastic Stack 6.0

Book description

Deliver end-to-end real-time distributed data processing solutions by leveraging the power of Elastic Stack 6.0

About This Book

  • Get to grips with the new features introduced in Elastic Stack 6.0
  • Get valuable insights from your data by working with the different components of the Elastic stack such as Elasticsearch, Logstash, Kibana, X-Pack, and Beats
  • Includes handy tips and techniques to build, deploy and manage your Elastic applications efficiently on-premise or on the cloud

Who This Book Is For

This book is for data professionals who want to get amazing insights and business metrics from their data sources. If you want to get a fundamental understanding of the Elastic Stack for distributed, real-time processing of data, this book will help you. A fundamental knowledge of JSON would be useful, but is not mandatory. No previous experience with the Elastic Stack is required.

What You Will Learn

  • Familiarize yourself with the different components of the Elastic Stack
  • Get to know the new functionalities introduced in Elastic Stack 6.0
  • Effectively build your data pipeline to get data from terabytes or petabytes of data into Elasticsearch and Logstash for searching and logging
  • Use Kibana to visualize data and tell data stories in real-time
  • Secure, monitor, and use the alerting and reporting capabilities of Elastic Stack
  • Take your Elastic application to an on-premise or cloud-based production environment

In Detail

The Elastic Stack is a powerful combination of tools for distributed search, analytics, logging, and visualization of data from medium to massive data sets. The newly released Elastic Stack 6.0 brings new features and capabilities that empower users to find unique, actionable insights through these techniques. This book will give you a fundamental understanding of what the stack is all about, and how to use it efficiently to build powerful real-time data processing applications.

After a quick overview of the newly introduced features in Elastic Stack 6.0, you'll learn how to set up the stack by installing the tools, and see their basic configurations. Then it shows you how to use Elasticsearch for distributed searching and analytics, along with Logstash for logging, and Kibana for data visualization. It also demonstrates the creation of custom plugins using Kibana and Beats. You'll find out about Elastic X-Pack, a useful extension for effective security and monitoring. We also provide useful tips on how to use the Elastic Cloud and deploy the Elastic Stack in production environments.

On completing this book, you'll have a solid foundational knowledge of the basic Elastic Stack functionalities. You'll also have a good understanding of the role of each component in the stack to solve different data processing problems.

Style and approach

This step-by-step guide will show you the Elastic Stack, covering all the components through interactive and easy-to-follow examples. It also includes handy tips.

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Publisher resources

Download Example Code

Table of contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Downloading the color images of this book
      3. Errata
      4. Piracy
      5. Questions
  2. Introducing Elastic Stack
    1. What is Elasticsearch, and why use it?
      1. Schemaless and document-oriented
      2. Searching
      3. Analytics
      4. Rich client library support and the REST API
      5. Easy to operate and easy to scale 
      6. Near real time
      7. Lightning fast
      8. Fault tolerant
    2. Exploring the components of Elastic Stack
      1. Elasticsearch
      2. Logstash
      3. Beats
      4. Kibana
      5. X-Pack
        1. Security
        2. Monitoring
        3. Reporting
        4. Alerting
        5. Graph
      6. Elastic Cloud
    3. Use cases of Elastic Stack
      1. Log and security analytics
      2. Product search
      3. Metrics analytics
      4. Web search and website search
    4. Downloading and installing
      1. Installing Elasticsearch
      2. Installing Kibana
    5. Summary
  3. Getting Started with Elasticsearch
    1. Using the Kibana Console UI
    2. Core concepts
      1. Index
      2. Type
      3. Document
      4. Node
      5. Cluster
      6. Shards and replicas
      7. Mappings and data types
        1. Data types
          1. Core datatypes
          2. Complex datatypes
          3. Other datatypes
        2. Mappings
          1. Creating an index with the name catalog
          2. Defining the mappings for the type of product
      8. Inverted index
    3. CRUD operations
      1. Index API
        1. Indexing a document by providing an ID
        2. Indexing a document without providing an ID
      2. Get API
      3. Update API
      4. Delete API
    4. Creating indexes and taking control of mapping
      1. Creating an index
      2. Creating type mapping in an existing index
      3. Updating a mapping
    5. REST API overview
      1. Common API conventions
        1. Formatting the JSON response
        2. Dealing with multiple indices
          1. Searching all documents in one index
          2. Searching all documents in multiple indexes
          3. Searching all documents of a particular type in all indices
    6. Summary
  4. Searching-What is Relevant
    1. Basics of text analysis
      1. Understanding Elasticsearch analyzers
        1. Character filters
        2. Tokenizer
          1. Standard Tokenizer
        3. Token filters
      2. Using built-in analyzers
        1. Standard Analyzer
      3. Implementing autocomplete with a custom analyzer
    2. Searching from structured data
      1. Range query
        1. Range query on numeric types
        2. Range query with score boosting
        3. Range query on dates
      2. Exists query
      3. Term query
    3. Searching from full text
      1. Match query
        1. Operator
        2. minimum_should_match
        3. Fuzziness
      2. Match phrase query
      3. Multi match query
        1. Querying multiple fields with defaults
        2. Boosting one or more fields
        3. With types of multi match queries
    4. Writing compound queries
      1. Constant score query
      2. Bool query
        1. Combining OR conditions
        2. Combining conditions AND and OR conditions
        3. Adding NOT conditions
    5. Summary
  5. Analytics with Elasticsearch
    1. The basics of aggregations
      1. Bucket aggregations
      2. Metric aggregations
      3. Matrix aggregations
      4. Pipeline aggregations
    2. Preparing data for analysis
      1. Understanding the structure of data
      2. Loading the data using Logstash
    3. Metric aggregations
      1. Sum, average, min, and max aggregations
        1. Sum aggregation
        2. Average aggregation
        3. Min aggregation
        4. Max aggregation
      2. Stats and extended stats aggregations
        1. Stats aggregation
        2. Extended stats Aggregation
      3. Cardinality aggregation
    4. Bucket aggregations
      1. Bucketing on string data
        1. Terms aggregation
      2. Bucketing on numeric data
        1. Histogram aggregation
        2. Range aggregation
      3. Aggregations on filtered data
      4. Nesting aggregations
      5. Bucketing on custom conditions
        1. Filter aggregation
        2. Filters aggregation
      6. Bucketing on date/time data
        1. Date Histogram aggregation
          1. Creating buckets across time
          2. Using a different time zone
          3. Computing other metrics within sliced time intervals
          4. Focusing on a specific day and changing intervals
      7. Bucketing on geo-spatial data
        1. Geo distance aggregation
        2. GeoHash grid aggregation
    5. Pipeline aggregations
      1. Calculating the cumulative sum of usage over time
    6. Summary
  6. Analyzing Log Data
    1. Log analysis challenges
      1. Logstash 
        1. Installation and configuration
          1. Prerequisites
        2. Downloading and installing Logstash
          1. Installing on Windows
          2. Installing on Linux
          3. Running Logstash
    2. Logstash architecture
    3. Overview of Logstash plugins
      1. Installing or updating plugins
        1. Input plugins
        2. Output plugins
        3. Filter plugins
        4. Codec plugins
      2. Exploring plugins
        1. Exploring Input plugins
          1. File
          2. Beats
          3. JDBC
          4. IMAP
        2. Output plugins
          1. Elasticsearch
          2. CSV
          3. Kafka
          4. PagerDuty
        3. Codec plugins
          1. JSON
          2. Rubydebug 
          3. Multiline
        4. Filter plugins
    4. Ingest node
      1. Defining a pipeline 
      2. Ingest APIs
        1. Put pipeline API
        2. Get Pipeline API
        3. Delete pipeline API
        4. Simulate pipeline API
    5. Summary
  7. Building Data Pipelines with Logstash
    1. Parsing and enriching logs using Logstash
      1. Filter plugins
        1. CSV filter 
        2. Mutate filter
        3. Grok filter
        4. Date filter
        5. Geoip filter
        6. Useragent filter
    2. Introducing Beats
      1. Beats by Elastic.co
        1. Filebeat
        2. Metricbeat
        3. Packetbeat
        4. Heartbeat
        5. Winlogbeat
        6. Auditbeat
      2. Community Beats
      3. Logstash versus Beats
    3. Filebeat
      1. Downloading and installing Filebeat
        1. Installing on Windows
        2. Installing on Linux
      2. Architecture
      3. Configuring Filebeat
        1. Filebeat prospectors
        2. Filebeat global options
        3. Filebeat general options
        4. Output configuration 
        5. Filebeat modules
    4. Summary
  8. Visualizing data with Kibana
    1. Downloading and installing Kibana
      1. Installing on Windows
      2. Installing on Linux
      3. Configuring Kibana
    2. Data preparation
    3. Kibana UI
      1. User interaction
      2. Configuring the index pattern
      3. Discover
        1. Elasticsearch query string
        2. Elasticsearch DSL query
      4. Visualize
        1. Kibana aggregations
          1. Bucket aggregations
          2. Metric
      5. Creating a visualization
      6. Visualization types
        1. Line, area, and bar charts
        2. Data table
        3. MarkDown widget
        4. Metric
        5. Goal
        6. Gauge
        7. Pie charts
        8. Co-ordinate maps
        9. Region maps
        10. Tag cloud
      7. Visualizations in action
        1. Response codes over time
        2. Top 10 URLs requested
        3. Bandwidth usage of top five countries over time
        4. Web traffic originating from different countries
        5. Most used user agent
      8. Dashboards
        1. Creating a dashboard
        2. Saving the dashboard 
        3. Cloning the dashboard
        4. Sharing the dashboard 
    4. Timelion
      1. Timelion UI
      2. Timelion expressions
    5. Using plugins
      1. Installing plugins
      2. Removing plugins
    6. Summary
  9. Elastic X-Pack
    1. Installing X-Pack 
      1. Installing X-Pack on Elasticsearch
      2. Installing X-Pack on Kibana
      3. Uninstalling X-Pack
    2. Configuring X-Pack
    3. Security
      1. User authentication
      2. User authorization
      3. Security in action
        1. New user creation
          1. Deleting a user
          2. Changing the password
        2. New role creation
          1. How to Delete/Edit a role
        3. Document-level security or field-level security
        4. X-Pack security APIs
          1. User management APIs
          2. Role management APIs
    4. Monitoring Elasticsearch
      1. Monitoring UI
        1. Elasticsearch metrics
          1. Overview tab
          2. Nodes tab
          3. The Indices tab
    5. Alerting
      1. Anatomy of a watch
      2. Alerting in action
        1. Create a new alert
          1. Threshold Alert
          2. Advanced Watch
        2. How to Delete/Deactivate/Edit a Watch
    6. Summary
  10. Running Elastic Stack in Production
    1. Hosting Elastic Stack on a managed cloud
      1. Getting up and running on Elastic Cloud
      2. Using Kibana
      3. Overriding configuration 
      4. Recovering from a snapshot
    2. Hosting Elastic Stack on your own
      1. Selecting hardware
      2. Selecting an operating system
      3. Configuring Elasticsearch nodes
        1. JVM heap size
        2. Disable swapping
        3. File descriptors
        4. Thread pools and garbage collector
      4. Managing and monitoring Elasticsearch
      5. Running in Docker containers
      6. Special considerations while deploying to a cloud
        1. Choosing instance type
        2. Changing default ports; do not expose ports!
        3. Proxy requests
        4. Binding HTTP to local addresses
        5. Installing EC2 discovery plugin
        6. Installing S3 repository plugin
        7. Setting up periodic snapshots
    3. Backing up and restoring
      1. Setting up a repository for snapshots
        1. Shared filesystem
      2. Cloud or distributed filesystems
      3. Taking snapshots
      4. Restoring a specific snapshot
    4. Setting up index aliases
      1. Understanding index aliases
      2. How index aliases can help
    5. Setting up index templates
      1. Defining an index template
      2. Creating indexes on the fly
    6. Modeling time series data
      1. Scaling the index with unpredictable volume over time
        1. Unit of parallelism in Elasticsearch
          1. The effect of the number of shards on the relevance score
          2. The effect of the number of shards on the accuracy of aggregations
      2. Changing the mapping over time
        1. New fields get added
        2. Existing fields get removed
      3. Automatically deleting older documents
      4. How index-per-timeframe solves these issues
        1. Scaling with index-per-timeframe
        2. Changing the mapping over time
        3. Automatically deleting older documents
    7. Summary
  11. Building a Sensor Data Analytics Application
    1. Introduction to the application
      1. Understanding the sensor-generated data
      2. Understanding the sensor metadata
      3. Understanding the final stored data
    2. Modeling data in Elasticsearch
      1. Defining an index template
      2. Understanding the mapping
    3. Setting up the metadata database
    4. Building the Logstash data pipeline
      1. Accept JSON requests over the web
      2. Enrich the JSON with the metadata we have in the MySQL database
        1. The jdbc_streaming plugin 
        2. The mutate plugin
          1. Move the looked-up fields that are under lookupResult directly in JSON
          2. Combine the latitude and longitude fields under lookupResult as a location field
          3. Remove the unnecessary fields
      3. Store the resulting documents in Elasticsearch
    5. Sending data to Logstash over HTTP
    6. Visualizing the data in Kibana
      1. Set up an index pattern in Kibana
      2. Build visualizations
        1. How does the average temperature change over time?
        2. How does the average humidity change over time?
        3. How do temperature and humidity change at each location over time?
        4. Can I visualize temperature and humidity over a map?
        5. How are the sensors distributed across departments?
      3. Create a dashboard
    7. Summary
  12. Monitoring Server Infrastructure
    1. Metricbeat
      1. Downloading and installing Metricbeat
        1. Installing on Windows
        2. Installing on Linux
      2. Architecture
        1. Event structure
    2. Configuring Metricbeat
      1. Module configuration
        1. Enabling module configs in the modules.d directory
        2. Enabling module config in the metricbeat.yml file
      2. General settings
      3. Output configuration 
      4. Logging
    3. Capturing system metrics
      1. Running Metricbeat with the system module
      2. Specifying aliases
      3. Visualizing system metrics using Kibana
    4.  Deployment architecture
    5. Summary

Product information

  • Title: Learning Elastic Stack 6.0
  • Author(s): Pranav Shukla, Sharath Kumar M N
  • Release date: December 2017
  • Publisher(s): Packt Publishing
  • ISBN: 9781787281868