Video description
Streaming data enables you to rapidly assess and respond to events, but only if you have the right methods for processing it. In this unique O’Reilly video collection—taken from live sessions at Strata + Hadoop World 2015 in San Jose, California—you’ll learn about several analytics tools and event mining techniques from experts in the field.
Learn how to capture, process, and respond to high-velocity data quickly. This video collection includes:
Going Real-time: Data Collection and Stream Processing with Apache Kafka
jay kreps (Confluent)
Discover what happens when every click, impression, database change, and application log is available as a real-time stream of well-structured data—based on real-world examples from LinkedIn and other organizations.
Stream Processing Everywhere—What to Use?
Jim Scott (MapR Technologies, Inc.)
To help you decide which solution to use for processing data from social media streams and sensor devices in real time, Jim compares three Apache projects—Storm, Spark, and Samza.
From Source to Solution: Building a System for Machine and Event-Oriented Data
Eric Sammer (Rocana)
Follow the flow of data through an end-to-end system built to handle tens of terabytes an hour of event-oriented data, providing real time streaming, in-memory, SQL, and batch access to this data. You’ll learn how Hadoop, Kafka, Solr, and Impala/Hive were stitched together to build this system.
Spark Streaming—The State of the Union, and Beyond
Tathagata Das (Databricks)
Spark Streaming extends the core Apache Spark API to perform large-scale stream processing. In this session, you’ll learn interesting use cases of Spark Streaming in the wild, as well as interesting developments like the brand new Python API.
Dynamic Events in Massive Data Streams, from Astrophysics to Marketing Automation
Kirk Borne (George Mason University)
Big data stream analytics and massive event mining techniques are critical in several domains, including astrophysics (the Large Synoptic Survey Telescope), social uprisings, health epidemics, seismology, cybersecurity, and more. Kirk address these parallels, their big data applications, and some anticipated analytics solutions, including Decision Science-as-a-Service.
TSAR (the TimeSeries AggregatoR)—How to Count Tens of Billions of Daily Events in Real Time Using Open Source Technologies
Anirudh Todi (Twitter Inc.)
Find out how Twitter built TSAR from the ground up with Python and Scala on technologies such as Storm and Kafka, and learn the challenges they faced in scaling it to process tens of billions of events per day.
Streaming Analytics: It’s Not The Same Game
Subutai Ahmad (Numenta, Inc.)
The existing big data paradigm that requires storing data for batch analysis and extensive modeling by a human expert is incredibly inefficient. In this session, you’ll explore streaming data algorithms that are highly automated, adapt to changing statistics, and naturally deal with temporal data streams. The open source project NuPIC uses many of the core ideas.
Realtime Data Analysis Patterns
Mikio Braun (TU Berlin)
Examine the use of realtime data analysis patterns from data acquisition and processing to storage of historic data. You’ll learn about an architecture that includes approximative algorithms at its core for use cases, such as social media data and user real-time profiling and recommendation.
The IoT P2P Backbone
Bruno Fernandez-Ruiz (Yahoo)
Under current constraints, many sensor devices only send inferred metrics rather than store or broadcast raw datasets. And devices that can send raw data only do so when there’s a good connection, leading to latency in generating predictions. This insightful talk looks into these issues.
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Robert Grossman (University of Chicago)
Three case studies yielded several lessons on how to build anomaly detection systems for different operational systems. You’ll learn eight useful techniques that researchers identified from these case studies, including how best to deploy these techniques.
Publisher resources
Table of contents
- Introduction - Large-scale real time stream processing and analytics at Strata+Hadoop World - Ben Lorica
- Going Real-time: Data Collection and Stream Processing with Apache Kafka - Jay Kreps
- Say goodbye to batch - Tyler Akidau (Google)
- Stream Processing Everywhere - What to Use? - Jim Scott
- From Source to Solution: Building A System for Machine and Event-Oriented Data - Eric Sammer
- Spark Streaming - The State of the Union, and Beyond - Tathagata Das
- Dynamic Events in Massive Data Streams, from Astrophysics to Marketing Automation - Kirk Borne
- TSAR (the TimeSeries AggregatoR) - How to Count Tens of Billions of Daily Events in Real Time Using Open Source Technologies - Anirudh Todi
- Streaming Analytics: It’s Not The Same Game - Subutai Ahmad
- Realtime Data Analysis Patterns - Mikio Braun (streamdrill)
- The IoT P2P Backbone - Bruno Fernandez-Ruiz
- Practical Methods for Identifying Anomalies That Matter in Large Datasets - Robert Grossman
Product information
- Title: Large-scale Real-time Stream Processing and Analytics
- Author(s):
- Release date: June 2015
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491931028
You might also like
book
Mythical Man-Month, The: Essays on Software Engineering, Anniversary Edition, 2nd Edition
Few books on software project management have been as influential and timeless as The Mythical Man-Month. …
book
Modern Software Engineering: Doing What Works to Build Better Software Faster
Improve Your Creativity, Effectiveness, and Ultimately, Your Code In Modern Software Engineering, continuous delivery pioneer David …
book
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …
book
Effective Java, 3rd Edition
Since this Jolt-award winning classic was last updated in 2008, the Java programming environment has changed …