Time series data
Architecture and use cases
The ongoing and steep increase in the number of internet-connected devices is inescapable, but traditional data processing pipelines are not well-equipped to deal with streaming data and other data whose defining dimension is time. This course will provide an overview of time series data. You will dive into real-world use cases and look at different patterns to get the most value from your datasets. This course is designed to help analysts, engineers, architects, and product managers get the most out of time series data.
What you'll learn-and how you can apply it
By the end of this live, online course, you'll understand:
- How to store time series data for different use cases
- How to learn from time series data with Spark and Spark MlLib
- How to set up time series data to be accessed in real time
And you'll be able to:
- Gain insight from your time series data
- Increase accessibility to your time series data
This training course is for you because...
- You are an analyst looking for new ideas of what is possible with time series data.
- You are a software engineer who wants to use big data toolkits to handle time series in a way that maximises value.
- You are an architect or product manager who wants to discover new use cases to get real value from time series data
The following are required to make the best out of this class:
- Have use cases involving time series data
- Understanding of basic time series data models
About your instructor
Ted Malaska is the director of engineering for data streaming and persistence at Capital One. Previously, he was on the Battle.net team at Blizzard Entertainment, he was also a principal solutions architect at Cloudera, where he helped clients succeed with Hadoop and the Hadoop ecosystem, and a lead architect at the Financial Industry Regulatory Authority (FINRA). He has contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is the coauthor of Hadoop Application Architectures, a frequent conference speaker, and a blogger on data architectures.
The timeframes are only estimates and may vary according to how the class is progressing
- Overview of time series (25 minutes)
- Breakdown of common time series use cases (10 minutes)
- Back group of distributed execution (30 minutes)
- Back group of storage formats (30 minutes)
- Summary of trick from execution and storage (10 minutes)
- The different types of implementation for each use case (15 minutes)
- Streaming NRT
- Time Series DB
- Machine Learning
- Use case: Rolling averages, counts, stddev (30 minutes)
- Develop use cases for class to work on during break (25 minutes)
- Use case: Frequency (30 minutes)
- Use case: Comparing curves (30 minutes)
- Use case: Causation (40 minutes)
- Use case: NGrams of events: finding patterns (20 minutes)
- Use case: NRT alerting based on trends (30 minutes)
- Review class use cases and code (20 minutes)
- Review of execution and storage principles (10 minutes)