CHAPTER 11Java Programming for Big Data Applications

Data is the new oil.

—Clive Humby

11.1 What Is Big Data?

Data is important for our lives, and we are using data all the time. For example, experimental data saved in a file, student record or employee record saved in a database, sales figures saved in a spreadsheet, as well as common Word, Excel, PowerPoint files, sound files, and movie files, are all traditional data files, which can be stored, analyzed, and displayed using a standard personal computer. The sizes of traditional data files range from kilobytes (KB, 210 bytes), megabytes (MB, 220 bytes) to gigabytes (GB, 230 bytes), and sometimes to terabytes (TB, 240 bytes). However, with the rapid expansion of the Internet and the increase in mobile users, data can be measured with sizes larger than petabytes, 250 bytes. These data stores are called big data. Big data is too large and too complicated to be stored and analyzed using traditional computer hardware and software.

According to Wikipedia, the world's technological per-capita data capacity has doubled roughly every 40 months since the 1980s. As of 2012, about 2.5 exabytes (2.5×1018 bytes) of data were generated every day. According to an International ...

Get Practical Java Programming for IoT, AI, and Blockchain now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.