O'Reilly logo
live online training icon Live Online training

NoSQL Databases and Elastic Stack Primer

Topic: Data
Noureddin Sadawi

NoSQL databases (DBs) have gained much attention with the high volume of data that is generated every minute of every day. As large amounts of this data is not immediately suitable for storage in relational databases, it makes sense to find another way. This is where NoSQL (and consequently platforms such as Elasticsearch) come into play. In this course we learn the elastic stack for NoSQL data storage and retrieval. In more detail, we cover how to use the elastic stack to aggregate log events data in real-time. The elastic stack consists of the following four powerful tools: Elasticsearch, Logstash, Kibana and Beats.

Elasticsearch is a NoSQL DB, distributed search and analytics engine that has multiple benefits. For example, it is easy to install and use and it is a powerful search technology (based on Apache Lucene). Logstash is a log shipping and filtering service (a transportation pipeline) used to populate elasticsearch with data. Kibana is a web-interface that connects users with the elastic search database. It enables visualizations, dashboards and search options. Elasticsearch has become popular with the large open-source community due to its many powerful aspects. Beats is a lightweight data collector.

In this course you will learn the elastic stack from the ground up. We will go through several features of the components of the elastic stack and explain the terminology. We will see live how to install it and configure it correctly. We will also learn how to install useful plugins, see how to add documents to it and execute queries to retrieve any data. In addition, we will cover how to communicate with elasticsearch programmatically (using programming languages such as Python, Java and R).

What you'll learn-and how you can apply it

  • Develop understanding of what NoQL databases are
  • Learn what the elastic stack is and develop an understanding of its components
  • Learn how to correctly install and configure all components of the elastic stack and ensure they can communicate successfully
  • Develop an understanding of Elasticsearch’s terminology, indexing and how to create/delete indices
  • Learn how to use Logstash, Kibana and Beats with Elasticsearch
  • Learn how to add new documents, retrieve documents (i.e. run queries), delete and/or update documents
  • Learn how to communicate with Elasticsearch programmatically (using programming languages such as Python, Java and R)

This training course is for you because...

  • You are familiar relational databases and how to perform several processes such as storing/retrieving/updating/deleting data but you want to extend your skills to the state-of-the-art way of data storage and retrieval
  • You would like to learn what NoSQL is, why it is useful and what are the best scenarios to use it (i.e. you need to store data and you must decide how it is stored)
  • You would like to become a competent user of the elastic stack (which is becoming popular by the day)
  • You would like to learn how to correctly install and configure the elastic stack on different operating systems
  • You would like to learn how to use Elasticsearch programmatically (using programming languages such as Python and R)

Prerequisites

  • Familiarity with relational database management systems such as MySQL MS SQL Server and others
  • Familiarity with the JSON file format (JavaScript Object Notation)
  • Familiarity with communicating with RESTful APIs

Course Set-up

  • Any operating system is fine
  • Speedy internet connection
  • Java 1.8 or later installed on your operating system (with JAVA_HOME setup correctly)

Recommended Preparation

Recommended Follow-up

About your instructor

  • Dr. Noureddin Sadawi is a consultant in machine learning and data science. He has several years’ experience in various areas involving data manipulation and analysis. He received his PhD from the University of Birmingham, United Kingdom. During his PhD he developed a technique to extract precise information from bitmap images of chemical structure diagrams. He developed a tool called MolRec and used it to participate in evaluation contests at two international events - TREC2011 and CLEF2012 - and won both of them.

    Noureddin is an avid scientific software researcher and developer who has a passion for learning and teaching new technologies. He has been involved in several projects spanning a variety of fields such as bioinformatics, drug discovery, omics data analysis and much more. He has taught at multiple universities in the UK and has worked as a software engineer in different roles. One of his latest positions was a research associate at the highly respected Imperial College London where he contributed significantly to the PhenoMeNal project (a project that heavily uses docker). Currently, he is a research fellow at the department of computer science, Brunel University – London where he developed deep learning techniques for the analysis of human gesture data.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Part 1: Introduction and Elastic Stack Installation and Configuration (50 minutes)

  • Introduction and overview of the elastic stack
  • Why use the elastic stack (learn many features of its components)
  • Understanding the data flow in the elastic stack
  • Installing the elastic stack on a cloud instance (on AWS):
  • Configuring Elasticsearch, Logstash, Kibana and Beats to work and communicate correctly
  • Q&A

Break (10 minutes)

Part 2: Understanding Elasticsearch and Performing CRUD operations on it (50 minutes)

  • A deeper look into Elasticsearch and how it works
  • Understanding Elasticsearch’s Terminology
  • Some useful Elasticsearch plugins
  • CRUD operations on Elasticsearch
  • Creating Documents in Elasticsearch
  • Retrieving Documents from Elasticsearch
  • Updating Documents in Elasticsearch
  • Deleting Documents from Elasticsearch
  • Communicating with Elasticsearch programmatically (using Python and R)
  • Q&A

Break (10 minutes)

Part 3: Adding Documents and Logs to the Elastic Stack (50 minutes)

  • Installing and configuring nginx to work as a reverse proxy so Kibana can be accessed on the internet
  • Using Logstash to collect static Apache logs and analyzing them using Kibana
  • Using Logstash to collect static .CSV file and analyzing its data using Kibana
  • Collecting real-time web-logs, configuring Beats to upload them to Elasticsearch and analyzing them using Kibana
  • Monitoring the performance of the Elastic Stack

Q&A (10 minutes)

Course wrap up