Skip to Content
Data Science with Java
book

Data Science with Java

by Michael R. Brzustowicz
June 2017
Beginner to intermediate
233 pages
5h 57m
English
O'Reilly Media, Inc.
Content preview from Data Science with Java

Chapter 1. Data I/O

Events happen all around us, continuously. Occasionally, we make a record of a discrete event at a certain point in time and space. We can then define data as a collection of records that someone (or something) took the time to write down or present in any format imaginable. As data scientists, we work with data in files, databases, web services, and more. Usually, someone has gone through a lot of trouble to define a schema or data model that precisely denotes the names, types, tolerances, and inter-relationships of all the variables. However, it is not always possible to enforce a schema during data acquisition. Real data (even in well-designed databases) often has missing values, misspellings, incorrectly formatted types, duplicate representations for the same value, and the worst: several variables concatenated into one. Although you are probably excited to implement machine-learning algorithms and create stunning graphics, the most important and time-consuming aspect of data science is preparing the data and ensuring its integrity.

What Is Data, Anyway?

Your ultimate goal is to retrieve data from its source, reduce the data via statistical analysis or learning, and then present some kind of knowledge about what was learned, usually in the form of a graph. However, even if your result is a single value such as the total revenue, most engaged user, or a quality factor, you still follow the same protocol: input datareductive analysisoutput data.

Considering ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Mastering Java for Data Science

Mastering Java for Data Science

Alexey Grigorev
Java: Data Science Made Easy

Java: Data Science Made Easy

Richard M. Reese, Jennifer L. Reese, Alexey Grigorev

Publisher Resources

ISBN: 9781491934104Errata PageSupplemental Content