Book Description
Scala will be a valuable tool to have on hand during your data science journey for everything from data cleaning to cuttingedge machine learning
About This Book
 Build data science and data engineering solutions with ease
 An indepth look at each stage of the data analysis process  from reading and collecting data to distributed analytics
 Explore a broad variety of data processing, machine learning, and genetic algorithms through diagrams, mathematical formulations, and source code
Who This Book Is For
This learning path is perfect for those who are comfortable with Scala programming and now want to enter the field of data science. Some knowledge of statistics is expected.
What You Will Learn
 Transfer and filter tabular data to extract features for machine learning
 Read, clean, transform, and write data to both SQL and NoSQL databases
 Create Scala web applications that couple with JavaScript libraries such as D3 to create compelling interactive visualizations
 Load data from HDFS and HIVE with ease
 Run streaming and graph analytics in Spark for exploratory analysis
 Bundle and scale up Spark jobs by deploying them into a variety of cluster managers
 Build dynamic workflows for scientific computing
 Leverage open source libraries to extract patterns from time series
 Master probabilistic models for sequential data
In Detail
Scala is especially good for analyzing large sets of data as the scale of the task doesn't have any significant impact on performance. Scala's powerful functional libraries can interact with databases and build scalable frameworks  resulting in the creation of robust data pipelines.
The first module introduces you to Scala libraries to ingest, store, manipulate, process, and visualize data. Using real world examples, you will learn how to design scalable architecture to process and model data  starting from simple concurrency constructs and progressing to actor systems and Apache Spark. After this, you will also learn how to build interactive visualizations with web frameworks.
Once you have become familiar with all the tasks involved in data science, you will explore data analytics with Scala in the second module. You'll see how Scala can be used to make sense of data through easy to follow recipes. You will learn about Bokeh bindings for exploratory data analysis and quintessential machine learning with algorithms with Spark ML library. You'll get a sufficient understanding of Spark streaming, machine learning for streaming data, and Spark graphX.
Armed with a firm understanding of data analysis, you will be ready to explore the most cuttingedge aspect of data science  machine learning. The final module teaches you the A to Z of machine learning with Scala. You'll explore Scala for dependency injections and implicits, which are used to write machine learning algorithms. You'll also explore machine learning topics such as clustering, dimentionality reduction, Naïve Bayes, Regression models, SVMs, neural networks, and more.
This learning path combines some of the best that Packt has to offer into one complete, curated package. It includes content from the following Packt products:
 Scala for Data Science, Pascal Bugnion
 Scala Data Analysis Cookbook, Arun Manivannan
 Scala for Machine Learning, Patrick R. Nicolas
Style and approach
A complete package with all the information necessary to start building useful data engineering and data science solutions straight away. It contains a diverse set of recipes that cover the full spectrum of interesting data analysis tasks and will help you revolutionize your data analysis skills using Scala.
Publisher Resources
Table of Contents

Scala: Guide for Data Science Professionals
 Table of Contents
 Scala: Guide for Data Science Professionals
 Scala: Guide for Data Science Professionals
 Credits
 Preface

1. Module 1
 1. Scala and Data Science

2. Manipulating Data with Breeze
 Code examples
 Installing Breeze
 Getting help on Breeze

Basic Breeze data types
 Vectors
 Dense and sparse vectors and the vector trait
 Matrices
 Building vectors and matrices
 Advanced indexing and slicing
 Mutating vectors and matrices
 Matrix multiplication, transposition, and the orientation of vectors
 Data preprocessing and feature engineering
 Breeze – function optimization
 Numerical derivatives
 Regularization
 An example – logistic regression
 Towards reusable code
 Alternatives to Breeze
 Summary
 References
 3. Plotting with breezeviz
 4. Parallel Collections and Futures

5. Scala and SQL through JDBC
 Interacting with JDBC
 First steps with JDBC
 JDBC summary
 Functional wrappers for JDBC
 Safer JDBC connections with the loan pattern
 Enriching JDBC statements with the "pimp my library" pattern
 Wrapping result sets in a stream
 Looser coupling with type classes
 Creating a data access layer
 Summary
 References
 6. Slick – A Functional Interface for SQL
 7. Web APIs
 8. Scala and MongoDB

9. Concurrency with Akka
 GitHub follower graph
 Actors as people
 Hello world with Akka
 Case classes as messages
 Actor construction
 Anatomy of an actor
 Follower network crawler
 Fetcher actors
 Routing
 Message passing between actors
 Queue control and the pull pattern
 Accessing the sender of a message
 Stateful actors
 Follower network crawler
 Fault tolerance
 Custom supervisor strategies
 Lifecycle hooks
 What we have not talked about
 Summary
 References
 10. Distributed Batch Processing with Spark

11. Spark SQL and DataFrames
 DataFrames – a whirlwind introduction
 Aggregation operations
 Joining DataFrames together
 Custom functions on DataFrames
 DataFrame immutability and persistence
 SQL statements on DataFrames
 Complex data types – arrays, maps, and structs
 Interacting with data sources
 Standalone programs
 Summary
 References
 12. Distributed Machine Learning with MLlib

13. Web APIs with Play
 Clientserver applications
 Introduction to web frameworks
 ModelViewController architecture
 Single page applications
 Building an application
 The Play framework
 Dynamic routing
 Actions
 Interacting with JSON
 Querying external APIs and consuming JSON
 Creating APIs with Play: a summary
 Rest APIs: best practice
 Summary
 References
 14. Visualization with D3 and the Play Framework
 A. Pattern Matching and Extractors

II. Module 2

1. Getting Started with Breeze
 Introduction
 Getting Breeze – the linear algebra library

Working with vectors
 Getting ready

How to do it...
 Creating vectors
 Constructing a vector from values
 Creating a vector out of a function
 Creating a vector of linearly spaced values
 Creating a vector with values in a specific range
 Creating an entire vector with a single value
 Slicing a subvector from a bigger vector
 Creating a Breeze Vector from a Scala Vector
 Vector arithmetic
 Scalar operations
 Calculating the dot product of two vectors
 Creating a new vector by adding two vectors together
 Appending vectors and converting a vector of one type to another
 Concatenating two vectors
 Standard deviation
 Find the largest value in a vector
 Finding the sum, square root and log of all the values in the vector
 Working with matrices

Vectors and matrices with randomly distributed values

How it works...
 Creating vectors with uniformly distributed random values
 Creating vectors with normally distributed random values
 Creating vectors with random values that have a Poisson distribution
 Creating a matrix with uniformly random values
 Creating a matrix with normally distributed random values
 Creating a matrix with random values that has a Poisson distribution

How it works...
 Reading and writing CSV files

2. Getting Started with Apache Spark DataFrames
 Introduction
 Getting Apache Spark
 Creating a DataFrame from CSV
 Manipulating DataFrames
 Creating a DataFrame from Scala case classes
 3. Loading and Preparing Data – DataFrame
 4. Data Visualization

5. Learning from Data
 Introduction
 Supervised and unsupervised learning
 Gradient descent
 Predicting continuous values using linear regression
 Binary classification using LogisticRegression and SVM

Binary classification using LogisticRegression with Pipeline API

How to do it...
 Importing and splitting data as test and training sets
 Construct the participants of the Pipeline
 Preparing a pipeline and training a model
 Predicting against test data
 Evaluating a model without crossvalidation
 Constructing parameters for crossvalidation
 Constructing crossvalidator and fit the best model
 Evaluating the model with crossvalidation

How to do it...
 Clustering using Kmeans

Feature reduction using principal component analysis

How to do it...
 Dimensionality reduction of data for supervised learning
 Meannormalizing the training data
 Extracting the principal components
 Preparing the labeled data
 Preparing the test data
 Classify and evaluate the metrics
 Dimensionality reduction of data for unsupervised learning
 Meannormalizing the training data
 Extracting the principal components
 Arriving at the number of components
 Evaluating the metrics

How to do it...
 6. Scaling Up
 7. Going Further

1. Getting Started with Breeze

III. Module 3
 1. Getting Started
 2. Hello World!
 3. Data Preprocessing
 4. Unsupervised Learning
 5. Naïve Bayes Classifiers
 6. Regression and Regularization
 7. Sequential Data Models
 8. Kernel Models and Support Vector Machines

9. Artificial Neural Networks
 Feedforward neural networks (FFNN)

The multilayer perceptron (MLP)
 The activation function
 The network architecture
 Software design
 Model definition
 Training cycle/epoch
 Training strategies and classification
 Evaluation
 Benefits and limitations
 Summary
 10. Genetic Algorithms
 11. Reinforcement Learning
 12. Scalable Frameworks
 B. Basic Concepts
 C. Bibliography
 Index
Product Information
 Title: Scala: Guide for Data Science Professionals
 Author(s):
 Release date: February 2017
 Publisher(s): Packt Publishing
 ISBN: 9781787282858