Learning Path: Real-Time Data Applications

Video description

There are a variety of useful applications for real-time data, including quick identification of general patterns and trends in data, performing sentiment analysis, crafting responses in real-time, and—perhaps one of the most important uses—when having analysis immediately will change the outcome of the situation. This Learning Path provides an in-depth tour of technologies used in processing and analyzing real-time data.

Publisher resources

Download Example Code

Table of contents

  1. Introduction To Cassandra
    1. Introducing The Course
    2. Understanding What Cassandra Is
    3. Learning What Cassandra Is Being Used For
    4. Understanding The System Requirements
    5. Opening The Main Virtual Machine
    6. Pop Quiz - Intro to Cassandra
  2. Getting Started With The Architecture
    1. Understanding That Cassandra Is A Distributed Database
    2. Learning What Snitch Is For
    3. Learning What Gossip Is For
    4. Learning How Data Gets Distributed
    5. Learning About Replication
    6. Learning About Virtual Nodes
    7. Pop Quiz - Getting Started with Architecture
  3. Installing Cassandra
    1. Downloading Cassandra
    2. Ensuring Oracle Java 7 Is Installed
    3. Installing Cassandra
    4. Viewing The Main Configuration File
    5. Providing Cassandra With Permission To Directories
    6. Starting Cassandra
    7. Checking Status
    8. Accessing The Cassandra system.log File
    9. Pop Quiz - Installing Cassandra
  4. Communicating With Cassandra
    1. Understanding Ways To Communicate With Cassandra
    2. Using CQLSH
    3. Pop Quiz - Communicating with Cassandra
  5. Creating A Database
    1. Understanding A Cassandra Database
    2. Defining A Keyspace
    3. Deleting A Keyspace
    4. Pop Quiz - Creating a Database
    5. Lab: Create A Second Database
  6. Creating A Table
    1. Creating A Table
    2. Defining Columns And Data Types
    3. Defining A Primary Key
    4. Recognizing A Partition Key
    5. Specifying A Descending Clustering Order
    6. Pop Quiz - Creating a Table
    7. Lab: Create A Second Table
  7. Inserting Data
    1. Understanding Ways To Write Data
    2. Using The INSERT INTO Command
    3. Using The COPY Command
    4. How Data Is Stored In Cassandra
    5. How Data Is Stored On Disk
    6. Pop Quiz - Inserting Data
    7. Lab: Insert Data
  8. Modeling Data
    1. Understanding Data Modeling In Cassandra
    2. Using A WHERE Clause
    3. Understanding Secondary Indexes
    4. Creating A Secondary Index
    5. Defining A Composite Partition Key
    6. Pop Quiz - Modeling Data
  9. Creating An Application
    1. Understanding Cassandra Drivers
    2. Exploring The DataStax Java Driver
    3. Setting Up A Development Environment
    4. Creating An Application Page
    5. Acquiring The DataStax Java Driver Files
    6. Getting The DataStax Java Driver Files Through Maven
    7. Providing The DataStax Java Driver Files Manually
    8. Connecting To A Cassandra Cluster
    9. Executing A Query
    10. Displaying Query Results - Part 1
    11. Displaying Query Results - Part 2
    12. Using An MVC Pattern
    13. Pop Quiz - Creating an Application
    14. Lab: Create A Second Application - Part 1
    15. Lab: Create A Second Application - Part 2
    16. Lab: Create A Second Application - Part 3
  10. Updating And Deleting Data
    1. Updating Data
    2. Understanding How Updating Works
    3. Deleting Data
    4. Understanding Tombstones
    5. Using TTLs
    6. Updating A TTL
    7. Pop Quiz - Updating and Deleting Data
    8. Lab: Update And Delete Data
  11. Selecting Hardware
    1. Understanding Hardware Choices
    2. Understanding RAM And CPU Recommendations
    3. Selecting Storage
    4. Deploying In The Cloud
    5. Pop Quiz - Selecting Hardware
  12. Adding Nodes To A Cluster
    1. Understanding Cassandra Nodes
    2. Having A Network Connection - Part 1
    3. Having A Network Connection - Part 2
    4. Having A Network Connection - Part 3
    5. Specifying The IP Address Of A Node In Cassandra
    6. Specifying Seed Nodes
    7. Bootstrapping A Node
    8. Cleaning Up A Node
    9. Using cassandra-stress
    10. Pop Quiz - Adding Nodes to a Cluster
    11. Lab: Add A Third Node
  13. Monitoring A Cluster
    1. Understanding Cassandra Monitoring Tools
    2. Using Nodetool
    3. Using JConsole
    4. Learning About OpsCenter
    5. Pop Quiz - Monitoring a Cluster
  14. Repairing Nodes
    1. Understanding Repair
    2. Repairing Nodes
    3. Understanding Consistency - Part 1
    4. Understanding Consistency - Part 2
    5. Understanding Hinted Handoff
    6. Understanding Read Repair
    7. Pop Quiz - Repairing Nodes
    8. Lab: Repair Nodes For A Keyspace
  15. Removing A Node
    1. Understanding Removing A Node
    2. Decommissioning A Node
    3. Putting A Node Back Into Service
    4. Removing A Dead Node
    5. Pop Quiz - Removing a Node
    6. Lab: Put A Node Back Into Service
  16. Redefining A Cluster For Multiple Data Centers
    1. Redefining For Multiple Data Centers - Part 1
    2. Redefining For Multiple Data Centers - Part 2
    3. Changing Snitch Type
    4. Modifying cassandra-rackdc.properties
    5. Changing Replication Strategy - Part 1
    6. Changing Replication Strategy - Part 2
    7. Pop Quiz - Redefining a Cluster
  17. Resources For FurTher Learning
    1. Accessing Documentation
    2. Reading Blogs And Books
    3. Watching Video Recordings
    4. Posting Questions
    5. Attending Events
    6. Wrap Up
    7. The Case for Kafka
    8. The Basics
    9. Setting up a Kafka Cluster
    10. Writing a Kafka Producer
    11. Writing a Kafka Consumer
    12. Using Kafka from Python
    13. Troubleshooting Kafka
    14. Integrating Kafka and Hadoop with Flafka
    15. Kafka Availability and Consistency
    16. Kafka Ecosystem
    17. Future of Kafka
    18. Pre-Flight Check
    19. Spark Deconstructed
    20. A Brief History
    21. Simple Spark Apps
    22. Spark Essentials
    23. Spark Examples
    24. Unifying the Pieces - Spark SQL
    25. Unifying the Pieces - Spark Streaming
    26. Unifying the Pieces - MLlib and GraphX
    27. Unified Workflows Demo
    28. The Full SDLC
    29. Developer Certification
    30. Resources
    31. Introduction - Why DataFrames?
    32. ETL to Prepare the Data from Capital Bikeshare
    33. Create a DataFrame, Explore using SQL
    34. Data Preparation for Machine Learning Models
    35. Build a Classifier Using Naive Bayes
    36. Build a Classifier Using Decision Trees
    37. Build a Classifier Using Random Forests
    38. Use a DataFrame to Compare Models
    39. Parquet as a Best Practice with DataFrames
    40. How to Store a DataFrame with Parquet
    41. How to Read a DataFrame Back in From Parquet
    42. Use SQL to Estimate Route Durations
    43. Data Preparation for GraphX - Model Route Costs
    44. Use PageRank to Rank Popular Stations
    45. Optimize Routes to Columbus Circle
    46. Compare Results with Google Maps
    47. Analyze a Popular Tourist Route
    48. Examples of How to Use DataFrames in Python
    49. Summary - The New DataFrames Features in Spark
    50. Introduction - Large-scale real time stream processing and analytics at Strata+Hadoop World - Ben Lorica
    51. Going Real-time: Data Collection and Stream Processing with Apache Kafka - Jay Kreps
    52. Say goodbye to batch - Tyler Akidau (Google)
    53. Stream Processing Everywhere - What to Use? - Jim Scott
    54. From Source to Solution: Building A System for Machine and Event-Oriented Data - Eric Sammer
    55. Spark Streaming - The State of the Union, and Beyond - Tathagata Das
    56. Dynamic Events in Massive Data Streams, from Astrophysics to Marketing Automation - Kirk Borne
    57. TSAR (the TimeSeries AggregatoR) - How to Count Tens of Billions of Daily Events in Real Time Using Open Source Technologies - Anirudh Todi
    58. Streaming Analytics: It’s Not The Same Game - Subutai Ahmad
    59. Realtime Data Analysis Patterns - Mikio Braun (streamdrill)
    60. The IoT P2P Backbone - Bruno Fernandez-Ruiz
    61. Practical Methods for Identifying Anomalies That Matter in Large Datasets - Robert Grossman
  18. Introduction
    1. Introduction to Time Series Problems
  19. Kafka
    1. Kafka Architecture and Deployment
    2. Kafka Usage
  20. Spark
    1. Introduction to Spark
    2. Spark Architecture
  21. Spark Streaming
    1. Spark Streaming: Windows Slides
    2. Spark Streaming: Ingestion Sources Using Kafka
    3. Sparks Streaming: Operations on the Stream
  22. Cassandra
    1. Introduction to Cassandra
    2. Cassandra Basic Architecture
    3. Replication, High Availability and Multi Datacenter
    4. Cassandra Weather Website Example
    5. Cassandra Query Language (CQL)
    6. Cassandra Partitions Clustering
    7. Cassandra Read and Write Path
    8. Working with Cassandra
    9. Cassandra Drivers and Access Patterns
  23. Spark and Cassandra
    1. Spark and Cassandra Architecture
    2. Analyzing Cassandra Data Spark SQL
    3. Spark and Cassandra DataStax Enterprise
  24. Real World Use Cases
    1. Real World Use Cases: Streaming Problems
    2. Real World Use Cases: In-place Analytic Problems

Product information

  • Title: Learning Path: Real-Time Data Applications
  • Author(s): Ben Lorica
  • Release date: November 2015
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491952610