Book description
Recipes to help you overcome your data science hurdles using Java
About This Book
This book provides modern recipes in small steps to help an apprentice cook become a master chef in data science
Use these recipes to obtain, clean, analyze, and learn from your data
Learn how to get your data science applications to production and enterprise environments effortlessly
Who This Book Is For
This book is for Java developers who are familiar with the fundamentals of data science and want to improve their skills to become a pro.
What You Will Learn
Find out how to clean and make datasets ready so you can acquire actual insights by removing noise and outliers
Develop the skills to use modern machine learning techniques to retrieve information and transform data to knowledge. retrieve information from large amount of data in text format.
Familiarize yourself with cuttingedge techniques to store and search large volumes of data and retrieve information from large amounts of data in text format
Develop basic skills to apply big data and deep learning technologies on large volumes of data
Evolve your data visualization skills and gain valuable insights from your data
Get to know a stepbystep formula to develop an industrystandard, largescale, reallife data product
Gain the skills to visualize data and interact with users through data insights
In Detail
If you are looking to build data science models that are good for production, Java has come to the rescue. With the aid of strong libraries such as MLlib, Weka, DL4j, and more, you can efficiently perform all the data science tasks you need to.
This unique book provides modern recipes to solve your common and notsocommon data sciencerelated problems. We start with recipes to help you obtain, clean, index, and search data. Then you will learn a variety of techniques to analyze, learn from, and retrieve information from data. You will also understand how to handle big data, learn deeply from data, and visualize data.
Finally, you will work through unique recipes that solve your problems while taking data science to production, writing distributed data science applications, and much more—things that will come in handy at work.
Style and approach
This book contains short yet very effective recipes to solve most common problems. Some recipes cater to very specific, rare pain points. The recipes cover different data sets and work very closely to real production environments
Publisher resources
Table of contents

Java Data Science Cookbook
 Java Data Science Cookbook
 Credits
 About the Author
 About the Reviewer
 www.PacktPub.com
 Customer Feedback
 Preface

1. Obtaining and Cleaning Data
 Introduction
 Retrieving all filenames from hierarchical directories using Java
 Retrieving all filenames from hierarchical directories using Apache Commons IO
 Reading contents from text files all at once using Java 8
 Reading contents from text files all at once using Apache Commons IO
 Extracting PDF text using Apache Tika
 Cleaning ASCII text files using Regular Expressions
 Parsing Comma Separated Value (CSV) Files using Univocity
 Parsing Tab Separated Value (TSV) file using Univocity
 Parsing XML files using JDOM
 Writing JSON files using JSON.simple
 Reading JSON files using JSON.simple
 Extracting web data from a URL using JSoup
 Extracting web data from a website using Selenium Webdriver
 Reading table data from a MySQL database
 2. Indexing and Searching Data

3. Analyzing Data Statistically
 Introduction
 Generating descriptive statistics
 Generating summary statistics
 Generating summary statistics from multiple distributions
 Computing frequency distribution
 Counting word frequency in a string
 Counting word frequency in a string using Java 8
 Computing simple regression
 Computing ordinary least squares regression
 Computing generalized least squares regression
 Calculating covariance of two sets of data points
 Calculating Pearson's correlation of two sets of data points
 Conducting a paired ttest
 Conducting a Chisquare test
 Conducting the oneway ANOVA test
 Conducting a KolmogorovSmirnov test

4. Learning from Data  Part 1
 Introduction
 Creating and saving an AttributeRelation File Format (ARFF) file
 Crossvalidating a machine learning model
 Classifying unseen test data
 Classifying unseen test data with a filtered classifier
 Generating linear regression models
 Generating logistic regression models
 Clustering data points using the KMeans algorithm
 Clustering data from classes
 Learning association rules from data
 Selecting features/attributes using the lowlevel method, the filtering method, and the metaclassifier method
 5. Learning from Data  Part 2

6. Retrieving Information from Text Data
 Introduction
 Detecting tokens (words) using Java
 Detecting sentences using Java
 Detecting tokens (words) and sentences using OpenNLP
 Retrieving lemma, partofspeech, and recognizing named entities from tokens using Stanford CoreNLP
 Measuring text similarity with Cosine Similarity measure using Java 8
 Extracting topics from text documents using Mallet
 Classifying text documents using Mallet
 Classifying text documents using Weka

7. Handling Big Data
 Introduction
 Training an online logistic regression model using Apache Mahout
 Applying an online logistic regression model using Apache Mahout
 Solving simple text mining problems with Apache Spark
 Clustering using KMeans algorithm with MLib
 Creating a linear regression model with MLib
 Classifying data points with Random Forest model using MLib
 8. Learn Deeply from Data
 9. Visualizing Data
Product information
 Title: Java Data Science Cookbook
 Author(s):
 Release date: March 2017
 Publisher(s): Packt Publishing
 ISBN: 9781787122536
You might also like
book
Automate the Boring Stuff with Python, 2nd Edition
If you’ve ever spent hours renaming files or updating hundreds of spreadsheet cells, you know how …
book
Data Science from Scratch, 2nd Edition
To really learn data science, you should not only master the tools—data science libraries, frameworks, modules, …
book
Java: Data Science Made Easy
Data collection, processing, analysis, and more About This Book Your entry ticket to the world of …
book
Mastering Java for Data Science
Use Java to create a diverse range of Data Science applications and bring Data Science into …