Skip to content
O'Reilly home
Learning Path

Architect and Build Big Data Applications

Instructor Ben Lorica
Time to complete: 30h 56m

Published byInfinite Skills and O'Reilly Media, Inc.

CreatedAugust 2015

Data has become easier and easier to acquire, leading companies to collect all they can. This has led to increases in the volume, variety, and need for veracity as well as the velocity of information available for decision making. In order to capitalize on "big data"—which can simply mean more data than one is used to handling—an architecture must be in place to acquire, store, analyze, visualize, manage, share, and integrate the data. This Learning Path steps through the process needed to create application software to begin analyzing and subsequently capitalize on all that data.

This Learning Path begins by clearly defining the challenges and characteristics of large amounts of data—especially the problems of capitalization—and the technology that will construct the solution (e.g., Hadoop, Spark, NoSQL). It continues with how to create architecture for safely storing data, while remaining flexible and performant (Apache Cassandra) and how to move the data into storage (Apache Kafka). Once the data is collected and stored, it can be processed (Apache Spark). With the basic workflow in place, the focus then turns to scaling for enterprise and the architectural considerations at scale. The Path ends with an introduction to time-series analysis within the workflow, resulting in the presentation of real-world use cases for both streaming problems and in-place analytics.

This learning path contains self assessments – short, multiple-choice quizzes that you'll take as you work through a Learning Path. They give you quick insights into how you're doing and take the guesswork out of learning.

What you’ll learn—and how you can apply it

  • How to design and build applications capable of handling big data at scale for enterprise
  • How to work with time-sensitive, streaming data for real-time decision making
  • How to collect, process, and store time-series data with Apache Kafka, Spark, and Cassandra
  • How to keep data live and available to stay flexible into the future
  • How to manage workflows for efficiency in providing data to analysts

This Learning Path is for you because…

  • You're a data scientist with some experience and need to know more about big data
  • You're a data engineer and need to know more about big data solutions architecture
  • You're a data architect and need to learn big data solutions


  • Knowledge of the Linux operating system
  • Proficiency in object-oriented programming (e.g., Java, Scala, Python)
  • Proficiency in reading log files
  • Basic knowledge of distributed computing
  • Basic knowledge of machine learning

Materials or downloads needed in advance: Supplemental Content