Skip to Content
Designing Big Data Platforms
book

Designing Big Data Platforms

by Yusuf Aytas
July 2021
Beginner to intermediate
336 pages
9h 22m
English
Wiley
Content preview from Designing Big Data Platforms

5Offline Big Data Processing

After reading this chapter, you should be able to:

  • Explain boundaries of offline data processing
  • Understand HDFS based offline data processing
  • Understand Spark architecture and processing
  • Understand the use of Flink and Presto for offline data processing

After visiting data storage techniques for Big Data, we are now ready to dive into data processing techniques. In this chapter, we will examine offline data processing technologies in depth.

5.1 Defining Offline Data Processing

Online processing occurs when applications driven by user input need to respond to the user promptly. On the other hand, offline processing is when there is no commitment to respond to the user. Offline Big Data processing shares the same basis. If there is no commitment to meeting some time boundary when processing, I call it offline Big Data processing. Note that I somewhat changed the traditional definition of offline. Here, offline processing refers to operations that take place without user engagement. The term “batch processing” was purposely avoided because operations in bulk for online systems can be performed. What's more, near real time Big Data might have to be processed in micro‐batches. Nonetheless, we will focus on offline processing in this chapter.

Offline Big Data processing offer capabilities to transform, manage, or analyze data in bulk. A typical offline flow consists of steps to cleanse, transform, consolidate, and aggregate data. Once the data ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Designing Cloud Data Platforms

Designing Cloud Data Platforms

Lynda Partner, Danil Zburivsky
Designing Cloud Data Platforms

Designing Cloud Data Platforms

Danil Zburivsky, Lynda Partner

Publisher Resources

ISBN: 9781119690924Purchase Link