book

Practical Machine Learning

Name: Practical Machine Learning
Author: Sunila Gollapudi
ISBN: 9781784399689

by Sunila Gollapudi

January 2016

Beginner to intermediate

468 pages

10h 35m

English

Packt Publishing

Read now

Unlock full access

Practical Machine Learning
Table of Contents
Practical Machine Learning
Credits
Foreword
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and moreWhy subscribe?Free access for Packt account holders
Preface
What this book covers

What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example codeDownloading the color images of this bookErrataPiracyQuestions
1. Introduction to Machine learning
Machine learningDefinitionCore Concepts and TerminologyWhat is learning?DataLabeled and unlabeled dataTasksAlgorithmsModelsLogical modelsGeometric modelsProbabilistic modelsData and inconsistencies in Machine learningUnder-fittingOver-fittingData instabilityUnpredictable data formatsPractical Machine learning examplesTypes of learning problemsClassificationClusteringForecasting, prediction or regressionSimulationOptimizationSupervised learningUnsupervised learningSemi-supervised learningReinforcement learningDeep learning
Performance measures
Is the solution good?Mean squared error (MSE)Mean absolute error (MAE)Normalized MSE and MAE (NMSE and NMAE)Solving the errors: bias and variance
Some complementing fields of Machine learning
Data miningArtificial intelligence (AI)Statistical learningData science
Machine learning process lifecycle and solution architecture
Machine learning algorithms
Decision tree based algorithmsBayesian method based algorithmsKernel method based algorithmsClustering methodsArtificial neural networks (ANN)Dimensionality reductionEnsemble methodsInstance based learning algorithmsRegression analysis based algorithmsAssociation rule based learning algorithms
Machine learning tools and frameworks
Summary
2. Machine learning and Large-scale datasets
Big data and the context of large-scale Machine learningFunctional versus Structural – A methodological mismatchCommoditizing informationTheoretical limitations of RDBMSScaling-up versus Scaling-out storageDistributed and parallel computing strategiesMachine learning: Scalability and PerformanceToo many data points or instancesToo many attributes or featuresShrinking response time windows – need for real-time responsesHighly complex algorithmFeed forward, iterative prediction cyclesModel selection processPotential issues in large-scale Machine learning
Algorithms and Concurrency
Developing concurrent algorithms
Technology and implementation options for scaling-up Machine learning
MapReduce programming paradigmHigh Performance Computing (HPC) with Message Passing Interface (MPI)Language Integrated Queries (LINQ) frameworkManipulating datasets with LINQGraphics Processing Unit (GPU)Field Programmable Gate Array (FPGA)Multicore or multiprocessor systems
Summary
3. An Introduction to Hadoop's Architecture and Ecosystem
Introduction to Apache HadoopEvolution of Hadoop (the platform of choice)Hadoop and its core elements
Machine learning solution architecture for big data (employing Hadoop)
The Data Source layerThe Ingestion layerThe Hadoop Storage layerThe Hadoop (Physical) Infrastructure layer – supporting applianceHadoop platform / Processing layerThe Analytics layerThe Consumption layerExplaining and exploring data with VisualizationsSecurity and Monitoring layerHadoop core components frameworkHadoop Distributed File System (HDFS)Secondary Namenode and Checkpoint processSplitting large data filesBlock loading to the cluster and replicationWriting to and reading from HDFSHandling failuresHDFS command lineRESTFul HDFSMapReduceMapReduce architectureWhat makes MapReduce cater to the needs of large datasets?MapReduce execution flow and componentsDeveloping MapReduce componentsInputFormatOutputFormatMapper implementation
Hadoop 2.x
Hadoop ecosystem componentsHadoop installation and setupInstalling Jdk 1.7Creating a system user for Hadoop (dedicated)Disable IPv6Steps for installing Hadoop 2.6.0Starting HadoopHadoop distributions and vendors
Summary
4. Machine Learning Tools, Libraries, and Frameworks
Machine learning tools – A landscape
Apache Mahout
How does Mahout work?Installing and setting up Apache MahoutSetting up MavenSetting-up Apache Mahout using Eclipse IDESetting up Apache Mahout without EclipseMahout PackagesImplementing vectors in Mahout
R
Installing and setting up RIntegrating R with Apache HadoopApproach 1 – Using R and Streaming APIs in HadoopApproach 2 – Using the Rhipe package of RApproach 3 – Using RHadoopSummary of R/Hadoop integration approachesImplementing in R (using examples)R ExpressionsAssignmentsFunctionsR VectorsAssigning, accessing, and manipulating vectorsR MatricesR FactorsR Data FramesR Statistical frameworks
Julia
Installing and setting up JuliaDownloading and using the command line version of JuliaUsing Juno IDE for running JuliaUsing Julia via the browserRunning the Julia code from the command lineImplementing in Julia (with examples)Using variables and assignmentsNumeric primitivesData structuresWorking with Strings and String manipulationsPackagesInteroperabilityIntegrating with CIntegrating with PythonIntegrating with MATLABGraphics and plottingBenefits of adopting JuliaIntegrating Julia and Hadoop
Python
Toolkit options in PythonImplementation of Python (using examples)Installing Python and setting up scikit-learnLoading data
Apache Spark
ScalaProgramming with Resilient Distributed Datasets (RDD)
Spring XD
Summary
5. Decision Tree based learning
Decision treesTerminologyPurpose and usesConstructing a Decision treeHandling missing valuesConsiderations for constructing Decision treesChoosing the appropriate attribute(s)Information gain and EntropyGini indexGain ratioTermination Criteria / Pruning Decision treesDecision trees in a graphical representationInducing Decision trees – Decision tree algorithmsCARTC4.5Greedy Decision treesBenefits of Decision treesSpecialized treesOblique treesRandom forestsEvolutionary treesHellinger trees
Implementing Decision trees
Using MahoutUsing RUsing SparkUsing Python (scikit-learn)Using Julia
Summary
6. Instance and Kernel Methods Based Learning
Instance-based learning (IBL)Nearest NeighborsValue of k in KNNDistance measures in KNNEuclidean distanceHamming distanceMinkowski distanceCase-based reasoning (CBR)Locally weighed regression (LWR)Implementing KNNUsing MahoutUsing RUsing SparkUsing Python (scikit-learn)Using Julia
Kernel methods-based learning
Kernel functionsSupport Vector Machines (SVM)Inseparable DataImplementing SVMUsing MahoutUsing RUsing SparkUsing Python (Scikit-learn)Using Julia
Summary
7. Association Rules based learning
Association rules based learningAssociation rule – a definitionApriori algorithmRule generation strategyRules for defining appropriate minsupApriori – the downsideFP-growth algorithmApriori versus FP-growth
Implementing Apriori and FP-growth
Using MahoutUsing RUsing SparkUsing Python (Scikit-learn)Using Julia
Summary
8. Clustering based learning
Clustering-based learning
Types of clustering
Hierarchical clusteringPartitional clustering
The k-means clustering algorithm
Convergence or stopping criteria for the k-means clusteringK-means clustering on diskAdvantages of the k-means approachDisadvantages of the k-means algorithmDistance measuresComplexity measures
Implementing k-means clustering
Using MahoutUsing RUsing SparkUsing Python (scikit-learn)Using Julia
Summary
9. Bayesian learning
Bayesian learningStatistician's thinkingImportant terms and definitionsProbabilityTypes of eventsMutually exclusive or disjoint eventsIndependent eventsDependent eventsTypes of probabilityDistributionBernoulli distributionBinomial distributionPoisson probability distributionExponential distributionNormal distributionRelationship between the distributionsBayes' theoremNaïve Bayes classifierMultinomial Naïve Bayes classifierThe Bernoulli Naïve Bayes classifier
Implementing Naïve Bayes algorithm
Using MahoutUsing RUsing SparkUsing scikit-learnUsing Julia
Summary
10. Regression based learning
Regression analysisRevisiting statisticsProperties of expectation, variance, and covarianceProperties of varianceProperties of covarianceExampleANOVA and F StatisticsConfoundingEffect modification
Regression methods
Simple regression or simple linear regressionMultiple regressionPolynomial (non-linear) regressionGeneralized Linear Models (GLM)Logistic regression (logit link)Odds ratio in logistic regressionModelPoisson regression
Implementing linear and logistic regression
Using MahoutUsing RUsing SparkUsing scikit-learnUsing Julia
Summary
11. Deep learning
BackgroundThe human brainNeural networksNeuronSynapsesArtificial neurons or perceptronsLinear neuronsRectified linear neurons / linear threshold neuronsBinary threshold neuronsSigmoid neuronsStochastic binary neuronsNeural Network sizeAn exampleNeural network typesMultilayer fully connected feedforward networks or Multilayer Perceptrons (MLP)Jordan networksElman networksRadial Bias Function (RBF) networksHopfield networksDynamic Learning Vector Quantization (DLVQ) networksGradient descent methodBackpropagation algorithmSoftmax regression technique
Deep learning taxonomy
Convolutional neural networks (CNN/ConvNets)Convolutional layer (CONV)Pooling layer (POOL)Fully connected layer (FC)Recurrent Neural Networks (RNNs)Restricted Boltzmann Machines (RBMs)Deep Boltzmann Machines (DBMs)Autoencoders
Implementing ANNs and Deep learning methods
Using MahoutUsing RUsing SparkUsing Python (Scikit-learn)Using Julia
Summary
12. Reinforcement learning
Reinforcement Learning (RL)The context of Reinforcement LearningExamples of Reinforcement LearningEvaluative Feedbackn-Armed Bandit problemAction-value methodsReinforcement comparison methodsThe Reinforcement Learning problem – the world grid exampleMarkov Decision Process (MDP)Basic RL model – agent-environment interfaceDelayed rewardsThe policyReinforcement Learning – key features
Reinforcement learning solution methods
Dynamic Programming (DP)Generalized Policy Iteration (GPI)Monte Carlo methodsTemporal difference (TD) learningSarsa - on-Policy TDQ-Learning – off-Policy TDActor-critic methods (on-policy)R Learning (Off-policy)Implementing Reinforcement Learning algorithmsUsing MahoutUsing RUsing SparkUsing Python (Scikit-learn)Using Julia
Summary
13. Ensemble learning
Ensemble learning methodsThe wisdom of the crowdKey use casesRecommendation systemsAnomaly detectionTransfer learningStream mining or classificationEnsemble methodsSupervised ensemble methodsBoostingAdaBoostBaggingWaggingRandom forestsGradient boosting machines (GBM)Unsupervised ensemble methods
Implementing ensemble methods
Using MahoutUsing RUsing SparkUsing Python (Scikit-learn)Using Julia
Summary
14. New generation data architectures for Machine learning
Evolution of data architectures
Emerging perspectives & drivers for new age data architectures
Modern data architectures for Machine learning
Semantic data architectureThe business data lakeSemantic Web technologiesOntology and data integrationVendorsMulti-model database architecture / polyglot persistenceVendorsLambda Architecture (LA)Vendors
Summary
Index

Overview

Delve into the exciting world of machine learning with 'Practical Machine Learning' by Sunila Gollapudi. This book provides a hands-on approach to understanding and implementing machine learning techniques, equipping you with tools like Python, R, Julia, and Spark to tackle real-world data challenges. Learn not just the theory, but gain practical insights into deep learning, reinforcement learning, and much more.

What this Book will help me do

Understand and confidently apply machine learning algorithms to analyze complex datasets.
Harness the power of programming languages like Python, R, and Julia to build sophisticated machine learning projects.
Effectively utilize big data platforms such as Hadoop and Spark for advanced data processing.
Master a variety of machine learning techniques, from decision trees to deep learning methodologies.
Gain practical experience in integrating your machine learning workflows with tools for scalable analytics.

Author(s)

Sunila Gollapudi is an experienced data scientist and educator with a strong background in applying machine learning to address complex problems. With a talent for simplifying challenging concepts, Sunila equips readers with actionable skills and enriched knowledge in machine learning and big data platforms. Her engaging writing translates technical expertise into an approachable learning experience.

Who is it for?

This book is ideal for data scientists, big data professionals, and software engineers aiming to enhance their machine learning expertise. If you're familiar with Python, R, or similar technologies and want to delve into practical machine learning applications, you'll find this book invaluable. Whether you're solving real-world analytics challenges or aiming to explore scalable machine learning, this is the resource for you.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781784399689

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills