O'Reilly logo
live online training icon Live Online training

Learn Data Science with Java

Topic: Data
Shaun Wassell

Data science is an incredibly interesting field - not to mention lucrative, with “data scientist” regularly at the top of lists bearing titles like “hottest jobs of 2020”. The ability to use data science in the real world has profound implications for nearly every industry, and it is for this reason that a huge number of companies are looking for programmers who know how to do data science - and conversely, data scientists who know how to program. That’s where Java comes in. As you’ll see in this course, Java for Data Science provides a match made in heaven. Java’s power, flexibility, and popularity make it an incredible tool for the budding data scientist.

What you'll learn-and how you can apply it

  • Learn what data science is, how it can be used, and the basic data-science process
  • Learn the basics of working with and manipulating data in Java
  • Learn how to visualize data and some libraries that can help us do so
  • Learn the most important data-science algorithms for turning data into information. This will cover things like Nearest-neighbor, Bayes, Linear Regression, Decision trees, and more

This training course is for you because...

  • You’re a Java developer who wants to advance your career by breaking into the data-science field
  • You’re a data scientist who wants to see how to use Java in the field
  • You’re neither a data-scientist nor a programmer and want to dive head-first into both


  • Java: Knowledge of basic Java syntax and Object-Oriented Programming concepts.
  • Familiarity with mathematical concepts: Some experience with basic algebra.

Course Set-up

Recommended Preparation

Recommended Follow-up

About your instructor

  • Shaun is a lifelong programmer and problem-solving addict. His goal is to help people build incredible software and solve meaningful problems by mastering the art of software development. He currently works as a Senior React Developer, but also has a lot of side gigs, including consulting, freelance development, and online education. Don’t hesitate to get in contact with him if you enjoy his materials.


The timeframes are only estimates and may vary according to how the class is progressing

Day 1

Introduction (5 mins)

Learn the Basics of Data Science (55 mins)

  • Learn what data science is
  • See examples of what data science can do
  • Learn the basics of modeling and machine learning
  • Learn about the bias-variance trade-off
  • Learn about feature-extraction and selection
  • Learn the difference between supervised, unsupervised, and reinforcement learning
  • Q&A

Break (5 mins)

Gather and use data (55 mins)

  • Find and load data using Java
  • Perform initial data exploration
  • Learn how to clean data
  • Learn how to rescale data
  • Reduce the dimensionality of data
  • Q&A

Break (5 mins)

Learn the K-Nearest-Neighbors Algorithm (55 mins)

  • Learn the basics of KNN
  • Implement a KNN classifier in Java
  • Apply our KNN classifier to data
  • Q&A

Day Two

Introduction (5 minutes)

Learn the Naive Bayes Algorithm (50 mins)

  • Learn the basics of Naive Bayes
  • Implement a Naive Bayes classifier in Java
  • Apply our Naive Bayes classifier to data
  • Q&A

Break (5 mins)

Learn the Linear Regression Algorithm (55 mins)

  • Learn the basics of Linear Regression
  • Implement a Linear Regression classifier in Java
  • Apply our Linear Regression classifier to data
  • Q&A

Break (5 mins)

Learn about Clustering (55 mins)

  • Learn the basics of Clustering
  • Implement a Clustering algorithm in Java
  • Apply our Clustering algorithm to data
  • Q&A

Course Wrap-up