Video description
This webcast presents a solution for streaming anomaly detection: "Coral". The Coral system is composed of three elements: a machine learning module, an event processing scoring module, and a data store that is implemented using Spark, Akka, and Cassandra, respectively.Spark is employed to train the model, which identifies event anomalies from a given stream of incoming events. This module uses Spark SQL sample statistics and Spark MLlib k-means clustering, in order to identify the outliers. The model is re-trained at regular intervals as new (micro)batches of events arrive. We re-run the training algorithm by using Spark Streaming, to make sure that the trained anomaly detection model is up-to-date, even under changing trends and conditions.Both the stream of events and the trained anomaly detection model are persisted in Cassandra. Data events are collected in Cassandra and read out by Spark to perform the machine learning analytics. Once the model is trained in Spark, the model's parameters are written back to Cassandra. The model stored in Cassandra is subsequently accessed by the event processing module, implemented in Akka. The Akka runtime module will then score 1000s of event per seconds per node.Each element of this system (Spark, Akka, Cassandra) can be distributed on multiple nodes. Therefore this solution provides strong resilience and availability characteristics.In this webcast you will learn how to:- determine when to use batch, microbatch and event data processing- build an anomaly detection module using Spark, Spark SQL, and Spark MLlib- build an event processing engine with Akka- setup Cassandra to persist events as well as machine learning models- keep a machine learning model up-to-date by using Spark Streaming
Publisher resources
Table of contents
Product information
- Title: How to Build an Anomaly Detection Engine with Spark, Akka and Cassandra
- Author(s):
- Release date: December 2015
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491955253
You might also like
book
SQL for Data Analysis
With the explosion of data, computing power, and cloud data warehouses, SQL has become an even …
video
Complete Git Guide: Understand and Master Git and GitHub
Complete with practical activities, this comprehensive Git and GitHub guide will help you understand how Git …
video
Build a CI/CD Pipeline
Approximately 8 Hours of Video Instruction If your development team is still dealing with manual and …
book
Data Algorithms with Spark
Apache Spark's speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this …