O'Reilly logo
live online training icon Live Online training

Time series data

Architecture and use cases

Ted Malaska

The ongoing and steep increase in the number of internet-connected devices is inescapable, but traditional data processing pipelines are not well-equipped to deal with streaming data and other data whose defining dimension is time. This course will provide an overview of time series data. You will dive into real-world use cases and look at different patterns to get the most value from your datasets. This course is designed to help analysts, engineers, architects, and product managers get the most out of time series data.

What you'll learn-and how you can apply it

By the end of this live, online course, you'll understand:

  • How to store time series data for different use cases
  • How to learn from time series data with Spark and Spark MlLib
  • How to set up time series data to be accessed in real time

And you'll be able to:

  • Gain insight from your time series data
  • Increase accessibility to your time series data

This training course is for you because...

  • You are an analyst looking for new ideas of what is possible with time series data.
  • You are a software engineer who wants to use big data toolkits to handle time series in a way that maximises value.
  • You are an architect or product manager who wants to discover new use cases to get real value from time series data


The following are required to make the best out of this class:

  • Have use cases involving time series data
  • Understanding of basic time series data models

About your instructor

  • Ted Malaska is the director of engineering for data streaming and persistence at Capital One. Previously, he was on the Battle.net team at Blizzard Entertainment, he was also a principal solutions architect at Cloudera, where he helped clients succeed with Hadoop and the Hadoop ecosystem, and a lead architect at the Financial Industry Regulatory Authority (FINRA). He has contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is the coauthor of Hadoop Application Architectures, a frequent conference speaker, and a blogger on data architectures.


The timeframes are only estimates and may vary according to how the class is progressing


  • Overview of time series (25 minutes)
  • Breakdown of common time series use cases (10 minutes)
  • Back group of distributed execution (30 minutes)
  • Back group of storage formats (30 minutes)
  • Summary of trick from execution and storage (10 minutes)
  • The different types of implementation for each use case (15 minutes)
  • Batch
  • Streaming NRT
  • Time Series DB
  • Machine Learning
  • Use case: Rolling averages, counts, stddev (30 minutes)
  • Develop use cases for class to work on during break (25 minutes)


  • Use case: Frequency (30 minutes)
  • Use case: Comparing curves (30 minutes)
  • Use case: Causation (40 minutes)
  • Use case: NGrams of events: finding patterns (20 minutes)
  • Use case: NRT alerting based on trends (30 minutes)
  • Review class use cases and code (20 minutes)
  • Review of execution and storage principles (10 minutes)