September 2018
Intermediate to advanced
412 pages
11h 12m
English
Spark is a widely-used framework that can process large datasets using distributed computing. The data-ingested stream from Kafka can be consumed, and provides the ability to run streaming analysis for use cases, such as anomaly detection and other mission critical events, and to raise alarms in realtime based on the business requirements.
Spark can support a wide variety of analytics, such as simple math through machine learning using statistics and physics-based models. It can run the data transformations such as for extract, transform, and load (ETL) purposes before the business analytics can execute for the insights. Spark analytics can be configured to run in batch for long processing jobs in order to analyze multiple ...