How to Build an Anomaly Detection Engine with Spark, Akka and Cassandra

by Natalino Busa

Released December 2015

Publisher(s): O'Reilly Media, Inc.

ISBN: 9781491955253

Start your free trial

Video description

This webcast presents a solution for streaming anomaly detection: "Coral". The Coral system is composed of three elements: a machine learning module, an event processing scoring module, and a data store that is implemented using Spark, Akka, and Cassandra, respectively.Spark is employed to train the model, which identifies event anomalies from a given stream of incoming events. This module uses Spark SQL sample statistics and Spark MLlib k-means clustering, in order to identify the outliers. The model is re-trained at regular intervals as new (micro)batches of events arrive. We re-run the training algorithm by using Spark Streaming, to make sure that the trained anomaly detection model is up-to-date, even under changing trends and conditions.Both the stream of events and the trained anomaly detection model are persisted in Cassandra. Data events are collected in Cassandra and read out by Spark to perform the machine learning analytics. Once the model is trained in Spark, the model's parameters are written back to Cassandra. The model stored in Cassandra is subsequently accessed by the event processing module, implemented in Akka. The Akka runtime module will then score 1000s of event per seconds per node.Each element of this system (Spark, Akka, Cassandra) can be distributed on multiple nodes. Therefore this solution provides strong resilience and availability characteristics.In this webcast you will learn how to:- determine when to use batch, microbatch and event data processing- build an anomaly detection module using Spark, Spark SQL, and Spark MLlib- build an event processing engine with Akka- setup Cassandra to persist events as well as machine learning models- keep a machine learning model up-to-date by using Spark Streaming