Skip to Content
Java Cookbook, 1st Edition
book

Java Cookbook, 1st Edition

by Ian F. Darwin
February 2025
Intermediate to advanced
684 pages
16h 14m
English
O'Reilly Media, Inc.
Content preview from Java Cookbook, 1st Edition

Chapter 12. Data Science and R

12.0 Introduction

Data science is a relatively new discipline that first came to the attention of many with a 2010 article by O’Reilly’s Mike Loukides. While there are many definitions in the field, Loukides distills his detailed observation of and participation in data science into this definition:

A data application acquires its value from the data itself, and creates more data as a result. It’s not just an application with data; it’s a data product. Data science enables the creation of data products.

One of the main open source ecosystems for data science software is at Apache and includes Hadoop (which includes the Hadoop Distributed File System [HDFS], Hadoop MapReduce,1 the Ozone object store, and the YARN scheduler), the Cassandra distributed database, and the Spark compute engine. Read the Modules and Related projects sections of the Hadoop page for a current list.

What’s interesting here is that a great deal of this infrastructure, which is taken for granted by data scientists, is written in Java and Scala (a JVM language). Much of the rest is written in Python, a language that complements Java. Many users see only the Python side of things and don’t realize that Java is behind some of the infrastructure.

Data science (DS) problems can involve a lot of setup, so we’ll give only one example from traditional DS, using the Spark framework. Spark is written in Scala, so it can be used directly by Java code.

In the rest of the chapter I’ll ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Java Cookbook, 4th Edition

Java Cookbook, 4th Edition

Ian F. Darwin
Real-World Java

Real-World Java

Victor Grazi, Jeanne Boyarsky

Publisher Resources

ISBN: 9781098169961Errata Page