O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Java Data Science Cookbook

Book Description

Recipes to help you overcome your data science hurdles using Java

About This Book

  • This book provides modern recipes in small steps to help an apprentice cook become a master chef in data science
  • Use these recipes to obtain, clean, analyze, and learn from your data
  • Learn how to get your data science applications to production and enterprise environments effortlessly

Who This Book Is For

This book is for Java developers who are familiar with the fundamentals of data science and want to improve their skills to become a pro.

What You Will Learn

  • Find out how to clean and make datasets ready so you can acquire actual insights by removing noise and outliers
  • Develop the skills to use modern machine learning techniques to retrieve information and transform data to knowledge. retrieve information from large amount of data in text format.
  • Familiarize yourself with cutting-edge techniques to store and search large volumes of data and retrieve information from large amounts of data in text format
  • Develop basic skills to apply big data and deep learning technologies on large volumes of data
  • Evolve your data visualization skills and gain valuable insights from your data
  • Get to know a step-by-step formula to develop an industry-standard, large-scale, real-life data product
  • Gain the skills to visualize data and interact with users through data insights

In Detail

If you are looking to build data science models that are good for production, Java has come to the rescue. With the aid of strong libraries such as MLlib, Weka, DL4j, and more, you can efficiently perform all the data science tasks you need to.

This unique book provides modern recipes to solve your common and not-so-common data science-related problems. We start with recipes to help you obtain, clean, index, and search data. Then you will learn a variety of techniques to analyze, learn from, and retrieve information from data. You will also understand how to handle big data, learn deeply from data, and visualize data.

Finally, you will work through unique recipes that solve your problems while taking data science to production, writing distributed data science applications, and much more—things that will come in handy at work.

Style and approach

This book contains short yet very effective recipes to solve most common problems. Some recipes cater to very specific, rare pain points. The recipes cover different data sets and work very closely to real production environments

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Java Data Science Cookbook
    1. Java Data Science Cookbook
    2. Credits
    3. About the Author
    4. About the Reviewer
    5. www.PacktPub.com
      1. Why subscribe?
    6. Customer Feedback
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Sections
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
        5. See also
      5. Conventions
      6. Reader feedback
      7. Customer support
        1. Downloading the example code
        2. Downloading the color images of this book 
        3. Errata
        4. Piracy
        5. Questions
    8. 1. Obtaining and Cleaning Data
      1. Introduction
      2. Retrieving all filenames from hierarchical directories using Java
        1. Getting ready
        2. How to do it...
      3. Retrieving all filenames from hierarchical directories using Apache Commons IO
        1. Getting ready
        2. How to do it...
      4. Reading contents from text files all at once using Java 8
        1. How to do it...
      5. Reading contents from text files all at once using Apache Commons IO
        1. Getting ready
        2. How to do it...
      6. Extracting PDF text using Apache Tika
        1. Getting ready
        2. How to do it...
      7. Cleaning ASCII text files using Regular Expressions
        1. How to do it...
      8. Parsing Comma Separated Value (CSV) Files using Univocity
        1. Getting ready
        2. How to do it...
      9. Parsing Tab Separated Value (TSV) file using Univocity
        1. Getting ready
        2. How to do it...
      10. Parsing XML files using JDOM
        1. Getting ready
        2. How to do it...
      11. Writing JSON files using JSON.simple
        1. Getting ready
        2. How to do it...
      12. Reading JSON files using JSON.simple
        1. Getting ready
        2. How to do it ...
      13. Extracting web data from a URL using JSoup
        1. Getting ready
        2. How to do it...
      14. Extracting web data from a website using Selenium Webdriver
        1. Getting ready
        2. How to do it...
      15. Reading table data from a MySQL database
        1. Getting ready
        2. How to do it...
    9. 2. Indexing and Searching Data
      1. Introduction
      2. Indexing data with Apache Lucene
        1. Getting ready
        2. How to do it...
        3. How it works...
      3. Searching indexed data with Apache Lucene
        1. Getting ready
        2. How to do it...
    10. 3. Analyzing Data Statistically
      1. Introduction
      2. Generating descriptive statistics
        1. How to do it...
      3. Generating summary statistics
        1. How to do it...
      4. Generating summary statistics from multiple distributions
        1. How to do it...
        2. There's more...
      5. Computing frequency distribution
        1. How to do it...
      6. Counting word frequency in a string
        1. How to do it...
        2. How it works...
      7. Counting word frequency in a string using Java 8
        1. How to do it...
      8. Computing simple regression
        1. How to do it...
      9. Computing ordinary least squares regression
        1. How to do it...
      10. Computing generalized least squares regression
        1. How to do it...
      11. Calculating covariance of two sets of data points
        1. How to do it...
      12. Calculating Pearson's correlation of two sets of data points
        1. How to do it...
      13. Conducting a paired t-test
        1. How to do it...
      14. Conducting a Chi-square test
        1. How to do it...
      15. Conducting the one-way ANOVA test
        1. How to do it...
      16. Conducting a Kolmogorov-Smirnov test
        1. How to do it...
    11. 4. Learning from Data - Part 1
      1. Introduction
      2. Creating and saving an Attribute-Relation File Format (ARFF) file
        1. How to do it...
      3. Cross-validating a machine learning model
        1. How to do it...
      4. Classifying unseen test data
        1. Getting ready
        2. How to do it...
      5. Classifying unseen test data with a filtered classifier
        1. How to do it...
      6. Generating linear regression models
        1. How to do it...
      7. Generating logistic regression models
        1. How to do it...
      8. Clustering data points using the KMeans algorithm
        1. How to do it...
      9. Clustering data from classes
        1. How to do it...
      10. Learning association rules from data
        1. Getting ready
        2. How to do it...
      11. Selecting features/attributes using the low-level method, the filtering method, and the meta-classifier method
        1. Getting ready
        2. How to do it...
    12. 5. Learning from Data - Part 2
      1. Introduction
      2. Applying machine learning on data using Java Machine Learning (Java-ML) library
        1. Getting ready
        2. How to do it...
      3. Classifying data points using the Stanford classifier
        1. Getting ready
        2. How to do it...
        3. How it works...
      4. Classifying data points using Massive Online Analysis (MOA)
        1. Getting ready
        2. How to do it...
      5. Classifying multilabeled data points using Mulan
        1. Getting ready
        2. How to do it...
    13. 6. Retrieving Information from Text Data
      1. Introduction
      2. Detecting tokens (words) using Java
        1. Getting ready
        2. How to do it...
      3. Detecting sentences using Java
        1. Getting ready
        2. How to do it...
      4. Detecting tokens (words) and sentences using OpenNLP
        1. Getting ready
        2. How to do it...
      5. Retrieving lemma, part-of-speech, and recognizing named entities from tokens using Stanford CoreNLP
        1. Getting ready
        2. How to do it...
      6. Measuring text similarity with Cosine Similarity measure using Java 8
        1. Getting ready
        2. How to do it...
      7. Extracting topics from text documents using Mallet
        1. Getting ready
        2. How to do it...
      8. Classifying text documents using Mallet
        1. Getting ready
        2. How to do it...
      9. Classifying text documents using Weka
        1. Getting ready
        2. How to do it...
    14. 7. Handling Big Data
      1. Introduction
      2. Training an online logistic regression model using Apache Mahout
        1. Getting ready
        2. How to do it...
      3. Applying an online logistic regression model using Apache Mahout
        1. Getting ready
        2. How to do it...
      4. Solving simple text mining problems with Apache Spark
        1. Getting ready
        2. How to do it...
      5. Clustering using KMeans algorithm with MLib
        1. Getting ready
        2. How to do it...
      6. Creating a linear regression model with MLib
        1. Getting ready
        2. How to do it...
      7. Classifying data points with Random Forest model using MLib
        1. Getting ready
        2. How to do it...
    15. 8. Learn Deeply from Data
      1. Introduction
      2. Creating a Word2vec neural net using Deep Learning for Java (DL4j)
        1. How to do it...
        2. How it works...
        3. There's more
      3. Creating a Deep Belief neural net using Deep Learning for Java (DL4j)
        1. How to do it...
        2. How it works...
      4. Creating a deep autoencoder using Deep Learning for Java (DL4j)
        1. How to do it...
        2. How it works...
    16. 9. Visualizing Data
      1. Introduction
      2. Plotting a 2D sine graph
        1. Getting ready
        2. How to do it...
      3. Plotting histograms
        1. Getting ready
        2. How to do it...
      4. Plotting a bar chart
        1. Getting ready
        2. How to do it...
      5. Plotting box plots or whisker diagrams
        1. Getting ready
        2. How to do it...
      6. Plotting scatter plots
        1. Getting ready
        2. How to do it...
      7. Plotting donut plots
        1. Getting ready
        2. How to do it...
      8. Plotting area graphs
        1. Getting ready
        2. How to do it...