book

Machine Learning: End-to-End guide for Java developers

by Richard M. Reese, Jennifer L. Reese, Boštjan Kaluža, Dr. Uday Kamath, Krishna Choppella

October 2017

Intermediate to advanced

1159 pages

26h 10m

English

Packt Publishing

Read now

Unlock full access

What this learning path covers
Downloading the example codeErrataPiracyQuestions

Problems solved using data science
Using Java to support data science
Understanding the data formats used in data science applicationsOverview of CSV dataOverview of spreadsheetsOverview of databasesOverview of PDF filesOverview of JSONOverview of XMLOverview of streaming dataOverview of audio/video/images in Java
Using the HttpUrlConnection classWeb crawlers in JavaCreating your own web crawlerUsing the crawler4j web crawlerWeb scraping in JavaUsing API calls to access common social media sitesUsing OAuth to authenticate usersHanding TwitterHandling WikipediaHandling FlickrHandling YouTubeSearching by keyword
Handling data formatsHandling CSV dataHandling spreadsheetsHandling Excel spreadsheetsHandling PDF filesHandling JSONUsing JSON streaming APIUsing the JSON tree API
Using Java tokenizers to extract wordsJava core tokenizersThird-party tokenizers and librariesTransforming data into a usable formSimple text cleaningRemoving stop wordsFinding words in textFinding and replacing textData imputationSubsetting dataSorting textData validationValidating data typesValidating datesValidating e-mail addressesValidating ZIP codesValidating names
Changing the contrast of an imageSmoothing an imageBrightening an imageResizing an imageConverting images to different formats
Understanding plots and graphsVisual analysis goals
Using country as the categoryUsing decade as the category
Working with mean, mode, and medianCalculating the meanUsing simple Java techniques to find meanUsing Java 8 techniques to find meanUsing Google Guava to find meanUsing Apache Commons to find meanCalculating the medianUsing simple Java techniques to find medianUsing Apache Commons to find the medianCalculating the modeUsing ArrayLists to find multiple modesUsing a HashMap to find multiple modesUsing a Apache Commons to find multiple modes
Using simple linear regressionUsing multiple regression
Supervised learning techniquesDecision treesDecision tree typesDecision tree librariesUsing a decision tree with a book datasetTesting the book decision treeSupport vector machinesUsing an SVM for camping dataTesting individual instancesBayesian networksUsing a Bayesian network
Association rule learningUsing association rule learning to find buying relationships
Training a neural networkGetting started with neural network architectures
A basic Java example
Multilayer perceptron networksBuilding the modelEvaluating the modelPredicting other valuesSaving and retrieving the modelLearning vector quantizationSelf-Organizing MapsUsing a SOMDisplaying the SOM results
The k-Nearest Neighbors algorithmInstantaneously trained networksSpiking neural networksCascading neural networksHolographic associative memoryBackpropagation and neural networks
Deeplearning4j architectureAcquiring and manipulating dataReading in a CSV fileConfiguring and building a modelUsing hyperparameters in ND4JInstantiating the network modelTraining a modelTesting a model
Preparing the dataSetting up the classReading and preparing the dataBuilding the modelEvaluating the model
Reconstruction in an RBMConfiguring an RBM
Building an autoencoder in DL4JConfiguring the networkBuilding and training the networkSaving and retrieving a networkSpecialized autoencoders
Building the modelEvaluating the model
Implementing named entity recognitionUsing OpenNLP to perform NERIdentifying location entities
Word2Vec and Doc2VecClassifying text by labelsClassifying text by similarity
Using OpenNLP to identify POSUnderstanding POS tags
Using OpenNLP to extract relationships
Downloading and extracting the Word2Vec modelBuilding our model and classifying text
Text-to-speechUsing FreeTTSGetting information about voicesGathering voice information
Using CMUPhinx to convert speech to textObtaining more detail about the words
Using Tess4j to extract text
Using OpenCV to detect faces
Creating a Neuroph Studio project for classifying visual imagesTraining the model
Implementing basic matrix operationsUsing GPUs with DeepLearning4j
Using Apache's Hadoop to perform map-reduceWriting the map methodWriting the reduce methodCreating and executing a new Hadoop job
Using the jblas APIUsing the Apache Commons math APIUsing the ND4J API
Creating an Aparapi applicationUsing Aparapi for matrix multiplication
Understanding Java 8 lambda expressions and streamsUsing Java 8 to perform matrix multiplicationUsing Java 8 to perform map-reduce
Defining the purpose and scope of our application
Extracting data for a sentiment analysis modelBuilding the sentiment modelProcessing the JSON inputCleaning data to improve our resultsRemoving stop wordsPerforming sentiment analysisAnalysing the results
Machine learning and data scienceWhat kind of problems can machine learning solve?Applied machine learning workflow
Measurement scales
Find or observe dataGenerate dataSampling traps
Data cleaningFill missing valuesRemove outliersData transformationData reduction
Find similar itemsEuclidean distancesNon-Euclidean distancesThe curse of dimensionalityClustering
ClassificationDecision tree learningProbabilistic classifiersKernel methodsArtificial neural networksEnsemble learningEvaluating classificationPrecision and recallRoc curvesRegressionLinear regressionEvaluating regressionMean squared errorMean absolute errorCorrelation coefficient
Underfitting and overfittingTrain and test setsCross-validationLeave-one-out validationStratification
The need for Java
WekaJava machine learningApache MahoutApache SparkDeeplearning4jMALLETComparing libraries
Traditional machine learning architectureDealing with big dataBig data application architecture
Before you start
DataLoading dataFeature selectionLearning algorithmsClassify new dataEvaluation and prediction error metricsConfusion matrixChoosing a classification algorithm
Loading the dataAnalyzing attributesBuilding and evaluating regression modelLinear regressionRegression treesTips to avoid common regression problems
Clustering algorithmsEvaluation
Customer relationship databaseChallengeDatasetEvaluation
Getting the dataLoading the data
Evaluating modelsImplementing naive Bayes baseline
Before we startData pre-processingAttribute selectionModel selectionPerformance evaluation
Market basket analysisAffinity analysis
Basic conceptsDatabase of transactionsItemset and ruleSupportConfidenceApriori algorithmFP-growth algorithm
AprioriFP-growth
Medical diagnosisProtein sequencesCensus dataCustomer relationship managementIT Operations Analytics
Basic conceptsKey conceptsUser-based and item-based analysisApproaches to calculate similarityCollaborative filteringContent-based filteringHybrid approachExploitation versus exploration
Configuring Mahout in Eclipse with the Maven plugin
Book ratings datasetLoading the dataLoading data from fileLoading data from databaseIn-memory databaseCollaborative filteringUser-based filteringItem-based filteringAdding custom rules to recommendationsEvaluationOnline learning engine
Suspicious and anomalous behavior detectionUnknown-unknowns
Analysis typesPattern analysisTransaction analysisPlan recognition
DatasetModeling suspicious patternsVanilla approachDataset rebalancing
DatasetAnomaly detection in time series dataHistogram-based anomaly detectionLoading the dataCreating histogramsDensity based k-nearest neighbors
Introducing image recognitionNeural networksPerceptronFeedforward neural networksAutoencoderRestricted Boltzmann machineDeep convolutional networks
Deeplearning4jGetting DL4JMNIST datasetLoading the dataBuilding modelsBuilding a single-layer regression modelBuilding a deep belief networkBuild a Multilayer Convolutional Network
Introducing activity recognitionMobile phone sensorsActivity recognition pipelineThe plan
Installing Android StudioLoading the data collectorFeature extractionCollecting training data
Reducing spurious transitionsPlugging the classifier into a mobile app
Introducing text miningTopic modelingText classification
Importing dataImporting from directoryImporting from filePre-processing text data
BBC datasetModelingEvaluating a modelReusing a modelSaving a modelRestoring a model
E-mail spam datasetFeature generationTraining and testingModel performance
Machine learning in real lifeNoisy dataClass unbalanceFeature selection is hardModel chainingImportance of evaluationGetting models into productionModel maintenance
CRISP-DMSEMMA methodologyPredictive Model Markup Language
Machine learning as a service
DatasetsOnline coursesCompetitionsWebsites and blogsVenues and conferences
Machine learning – history and definition
RolesProcess
Datasets
Formal description and notationData quality analysisDescriptive data analysisBasic label analysisBasic feature analysisVisualization analysisUnivariate feature analysisCategorical featuresContinuous featuresMultivariate feature analysis
Feature constructionHandling missing valuesOutliersDiscretizationData samplingIs sampling needed?Undersampling and oversamplingStratified samplingTraining, validation, and test set
Feature search techniquesFeature evaluation techniquesFilter approachUnivariate feature selectionInformation theoretic approachStatistical approachMultivariate feature selectionMinimal redundancy maximal relevance (mRMR)Correlation-based feature selection (CFS)Wrapper approachEmbedded approach
Linear modelsLinear RegressionAlgorithm input and outputHow does it work?Advantages and limitationsNaïve BayesAlgorithm input and outputHow does it work?Advantages and limitationsLogistic RegressionAlgorithm input and outputHow does it work?Advantages and limitationsNon-linear modelsDecision TreesAlgorithm inputs and outputsHow does it work?Advantages and limitationsK-Nearest Neighbors (KNN)Algorithm inputs and outputsHow does it work?Advantages and limitationsSupport vector machines (SVM)Algorithm inputs and outputsHow does it work?Advantages and limitationsEnsemble learning and meta learnersBootstrap aggregating or baggingAlgorithm inputs and outputsHow does it work?Random ForestAdvantages and limitationsBoostingAlgorithm inputs and outputsHow does it work?Advantages and limitations
Model assessmentModel evaluation metricsConfusion matrix and related metricsROC and PRC curvesGain charts and lift curvesModel comparisonsComparing two algorithmsMcNemar's TestPaired-t testWilcoxon signed-rank testComparing multiple algorithmsANOVA testFriedman's test
Business problemMachine learning mappingData analysisLabel analysisFeatures analysisSupervised learning experimentsWeka experimentsSample end-to-end process in JavaWeka experimenter and model selectionRapidMiner experimentsVisualization analysisFeature selectionModel process flowModel evaluation metricsEvaluation on Confusion MetricsROC Curves, Lift Curves, and Gain ChartsResults, observations, and analysis
Issues in common with supervised learning
NotationLinear methodsPrincipal component analysis (PCA)Inputs and outputsHow does it work?Advantages and limitationsRandom projections (RP)Inputs and outputsHow does it work?Advantages and limitationsMultidimensional Scaling (MDS)Inputs and outputsHow does it work?Advantages and limitationsNonlinear methodsKernel Principal Component Analysis (KPCA)Inputs and outputsHow does it work?Advantages and limitationsManifold learningInputs and outputsHow does it work?Advantages and limitations
Clustering algorithmsk-MeansInputs and outputsHow does it work?Advantages and limitationsDBSCANInputs and outputsHow does it work?Advantages and limitationsMean shiftInputs and outputsHow does it work?Advantages and limitationsExpectation maximization (EM) or Gaussian mixture modeling (GMM)Input and outputHow does it work?Advantages and limitationsHierarchical clusteringInput and outputHow does it work?Advantages and limitationsSelf-organizing maps (SOM)Inputs and outputsHow does it work?Advantages and limitationsSpectral clusteringInputs and outputsHow does it work?Advantages and limitationsAffinity propagationInputs and outputsHow does it work?Advantages and limitationsClustering validation and evaluationInternal evaluation measuresNotationR-SquaredDunn's IndicesDavies-Bouldin indexSilhouette's indexExternal evaluation measuresRand indexF-MeasureNormalized mutual information index
Outlier algorithmsStatistical-basedInputs and outputsHow does it work?Advantages and limitationsDistance-based methodsInputs and outputsHow does it work?Advantages and limitationsDensity-based methodsInputs and outputsHow does it work?Advantages and limitationsClustering-based methodsInputs and outputsHow does it work?Advantages and limitationsHigh-dimensional-based methodsInputs and outputsHow does it work?Advantages and limitationsOne-class SVMInputs and outputsHow does it work?Advantages and limitationsOutlier evaluation techniquesSupervised evaluationUnsupervised evaluation
Tools and softwareBusiness problemMachine learning mappingData collectionData quality analysisData sampling and transformationFeature analysis and dimensionality reductionPCARandom projectionsISOMAPObservations on feature analysis and dimensionality reductionClustering models, results, and evaluationObservations and clustering analysisOutlier models, results, and evaluationObservations and analysis
Semi-supervised learningRepresentation, notation, and assumptionsSemi-supervised learning techniquesSelf-training SSLInputs and outputsHow does it work?Advantages and limitationsCo-training SSL or multi-view SSLInputs and outputsHow does it work?Advantages and limitationsCluster and label SSLInputs and outputsHow does it work?Advantages and limitationsTransductive graph label propagationInputs and outputsHow does it work?Advantages and limitationsTransductive SVM (TSVM)Inputs and outputsHow does it work?Advantages and limitationsCase study in semi-supervised learningTools and softwareBusiness problemMachine learning mappingData collectionData quality analysisData sampling and transformationDatasets and analysisFeature analysis resultsExperiments and resultsAnalysis of semi-supervised learning
Representation and notationActive learning scenariosActive learning approachesUncertainty samplingHow does it work?Least confident samplingSmallest margin samplingLabel entropy samplingAdvantages and limitationsVersion space samplingQuery by disagreement (QBD)How does it work?Query by Committee (QBC)How does it work?Advantages and limitationsData distribution samplingHow does it work?Expected model changeExpected error reductionVariance reductionDensity weighted methodsAdvantages and limitations
Tools and softwareBusiness problemMachine learning mappingData CollectionData sampling and transformationFeature analysis and dimensionality reductionModels, results, and evaluationPool-based scenariosStream-based scenariosAnalysis of active learning results
Assumptions and mathematical notations
Stream computationsSliding windowsSampling
Data managementPartial memoryFull memoryDetection methodsMonitoring model evolutionWidmer and KubatDrift Detection Method or DDMEarly Drift Detection Method or EDDMMonitoring distribution changesWelch's t testKolmogorov-Smirnov's testCUSUM and Page-Hinckley testAdaptation methodsExplicit adaptationImplicit adaptation
Modeling techniquesLinear algorithmsOnline linear models with loss functionsInputs and outputsHow does it work?Advantages and limitationsOnline Naïve BayesInputs and outputsHow does it work?Advantages and limitationsNon-linear algorithmsHoeffding trees or very fast decision trees (VFDT)Inputs and outputsHow does it work?Advantages and limitationsEnsemble algorithmsWeighted majority algorithmInputs and outputsHow does it work?Advantages and limitationsOnline Bagging algorithmInputs and outputsHow does it work?Advantages and limitationsOnline Boosting algorithmInputs and outputsHow does it work?Advantages and limitationsValidation, evaluation, and comparisons in online settingModel validation techniquesPrequential evaluationHoldout evaluationControlled permutationsEvaluation criteriaComparing algorithms and metrics
Modeling techniquesPartition basedOnline k-MeansInputs and outputsHow does it work?Advantages and limitationsHierarchical based and micro clusteringInputs and outputsHow does it work?Advantages and limitationsInputs and outputsHow does it work?Advantages and limitationsDensity basedInputs and outputsHow does it work?Advantages and limitationsGrid basedInputs and outputsHow does it work?Advantages and limitationsValidation and evaluation techniquesKey issues in stream cluster evaluationEvaluation measuresCluster Mapping Measures (CMM)V-MeasureOther external measures
Partition-based clustering for outlier detectionInputs and outputsHow does it work?Advantages and limitationsDistance-based clustering for outlier detectionInputs and outputsHow does it work?Exact StormAbstract-CDirect Update of Events (DUE)Micro Clustering based Algorithm (MCOD)Approx StormAdvantages and limitationsValidation and evaluation techniques
Tools and softwareBusiness problemMachine learning mappingData collectionData sampling and transformationFeature analysis and dimensionality reductionModels, results, and evaluationSupervised learning experimentsConcept drift experimentsClustering experimentsOutlier detection experimentsAnalysis of stream learning results
Probability revisitedConcepts in probabilityConditional probabilityChain rule and Bayes' theoremRandom variables, joint, and marginal distributionsMarginal independence and conditional independenceFactorsFactor typesDistribution queriesProbabilistic queriesMAP queries and marginal MAP queries
Graph structure and propertiesSubgraphs and cliquesPath, trail, and cycles
RepresentationDefinitionReasoning patternsCausal or predictive reasoningEvidential or diagnostic reasoningIntercausal reasoningCombined reasoningIndependencies, flow of influence, D-Separation, I-MapFlow of influenceD-SeparationI-MapInferenceElimination-based inferenceVariable elimination algorithmInput and outputHow does it work?Advantages and limitationsClique tree or junction tree algorithmInput and outputHow does it work?Advantages and limitationsPropagation-based techniquesBelief propagationFactor graphMessaging in factor graphInput and outputHow does it work?Advantages and limitationsSampling-based techniquesForward sampling with rejectionInput and outputHow does it work?Advantages and limitationsLearningLearning parametersMaximum likelihood estimation for Bayesian networksBayesian parameter estimation for Bayesian networkPrior and posterior using the Dirichlet distributionLearning structuresMeasures to evaluate structuresMethods for learning structuresConstraint-based techniquesInputs and outputsHow does it work?Advantages and limitationsSearch and score-based techniquesInputs and outputsHow does it work?Advantages and limitations
RepresentationParameterizationGibbs parameterizationFactor graphsLog-linear modelsIndependenciesGlobalPairwise MarkovMarkov blanketInferenceLearningConditional random fields
Tree augmented networkInput and outputHow does it work?Advantages and limitationsMarkov chainsHidden Markov modelsMost probable path in HMMPosterior decoding in HMM
OpenMarkovWeka Bayesian Network GUI
Business problemMachine learning mappingData sampling and transformationFeature analysisModels, results, and evaluationAnalysis of results
Multi-layer feed-forward neural networkInputs, neurons, activation function, and mathematical notationMulti-layered neural networkStructure and mathematical notationsActivation functions in NNSigmoid functionHyperbolic tangent ("tanh") functionTraining neural networkEmpirical risk minimizationParameter initializationLoss functionGradientsGradient at the output layerGradient at the Hidden LayerParameter gradientFeed forward and backpropagationHow does it work?RegularizationL2 regularizationL1 regularization
Vanishing gradients, local optimum, and slow training
Building blocks for deep learningRectified linear activation functionRestricted Boltzmann MachinesDefinition and mathematical notationConditional distributionFree energy in RBMTraining the RBMSampling in RBMContrastive divergenceInputs and outputsHow does it work?Persistent contrastive divergenceAutoencodersDefinition and mathematical notationsLoss functionLimitations of AutoencodersDenoising AutoencoderUnsupervised pre-training and supervised fine-tuningDeep feed-forward NNInput and outputsHow does it work?Deep AutoencodersDeep Belief NetworksInputs and outputsHow does it work?Deep learning with dropoutsDefinition and mathematical notationInputs and outputsHow does it work?Learning Training and testing with dropoutsSparse codingConvolutional Neural NetworkLocal connectivityParameter sharingDiscrete convolutionPooling or subsamplingNormalization using ReLUCNN LayersRecurrent Neural NetworksStructure of Recurrent Neural NetworksLearning and associated problems in RNNsLong Short Term MemoryGated Recurrent Units
Tools and softwareBusiness problemMachine learning mappingData sampling and transforFeature analysisModels, results, and evaluationBasic data handlingMulti-layer perceptronParameters used for MLPCode for MLPConvolutional NetworkParameters used for ConvNetCode for CNNVariational AutoencoderParameters used for the Variational AutoencoderCode for Variational AutoencoderDBNParameter search using ArbiterResults and analysis
NLP, subfields, and tasksText categorizationPart-of-speech tagging (POS tagging)Text clusteringInformation extraction and named entity recognitionSentiment analysis and opinion miningCoreference resolutionWord sense disambiguationMachine translationSemantic reasoning and inferencingText summarizationAutomating question and answers
Document collection and standardizationInputs and outputsHow does it work?TokenizationInputs and outputsHow does it work?Stop words removalInputs and outputsHow does it work?Stemming or lemmatizationInputs and outputsHow does it work?Local/global dictionary or vocabulary?Feature extraction/generationLexical featuresCharacter-based featuresWord-based featuresPart-of-speech tagging featuresTaxonomy featuresSyntactic featuresSemantic featuresFeature representation and similarityVector space modelBinaryTerm frequency (TF)Inverse document frequency (IDF)Term frequency-inverse document frequency (TF-IDF)Similarity measuresEuclidean distanceCosine distancePairwise-adaptive similarityExtended Jaccard coefficientDice coefficientFeature selection and dimensionality reductionFeature selectionInformation theoretic techniquesStatistical-based techniquesFrequency-based techniquesDimensionality reduction
Text categorization/classificationTopic modelingProbabilistic latent semantic analysis (PLSA)Input and outputHow does it work?Advantages and limitationsText clusteringFeature transformation, selection, and reductionClustering techniquesGenerative probabilistic modelsInput and outputHow does it work?Advantages and limitationsDistance-based text clusteringNon-negative matrix factorization (NMF)Input and outputHow does it work?Advantages and limitationsEvaluation of text clusteringNamed entity recognitionHidden Markov models for NERInput and outputHow does it work?Advantages and limitationsMaximum entropy Markov models for NERInput and outputHow does it work?Advantages and limitationsDeep learning and NLP
MalletKNIMETopic modeling with malletBusiness problemMachine Learning mappingData collectionData sampling and transformationFeature analysis and dimensionality reductionModels, results, and evaluationAnalysis of text processing results
What are the characteristics of Big Data?
General Big Data frameworkBig Data cluster deployment frameworksHortonworks Data PlatformCloudera CDHAmazon Elastic MapReduceMicrosoft Azure HDInsightData acquisitionPublish-subscribe frameworksSource-sink frameworksSQL frameworksMessage queueing frameworksCustom frameworksData storageHDFSNoSQLKey-value databasesDocument databasesColumnar databasesGraph databasesData processing and preparationHive and HQLSpark SQLAmazon RedshiftReal-time stream processingMachine LearningVisualization and analysis
H2O as Big Data Machine Learning platformH2O architectureMachine learning in H2OTools and usage
Business problemMachine Learning mappingData collectionData sampling and transformationExperiments, results, and analysisFeature relevance and analysisEvaluation on test dataAnalysis of resultsSpark MLlib as Big Data Machine Learning platformSpark architectureMachine Learning in MLlibTools and usageExperiments, results, and analysisk-Meansk-Means with PCABisecting k-Means (with PCA)Gaussian Mixture ModelRandom ForestAnalysis of resultsReal-time Big Data Machine LearningSAMOA as a real-time Big Data Machine Learning frameworkSAMOA architectureMachine Learning algorithmsTools and usageExperiments, results, and analysisAnalysis of resultsThe future of Machine LearningSummaryReferences
VectorScalar product of vectors
Transpose of a matrixMatrix additionScalar multiplicationMatrix multiplicationProperties of matrix productLinear transformationMatrix inverseEigendecompositionPositive definite matrixSingular value decomposition (SVD)
Axioms of probability
Density estimationMeanVarianceStandard deviationGaussian standard deviationCovarianceCorrelation coefficientBinomial distributionPoisson distributionGaussian distributionCentral limit theoremError propagation

Content preview from Machine Learning: End-to-End guide for Java developers

Basic naive Bayes classifier baseline

As per the rules of the challenge, the participants had to outperform the basic naive Bayes classifier to qualify for prizes, which makes an assumption that features are independent (refer to Chapter 1, Applied Machine Learning Quick Start).

The KDD Cup organizers run the vanilla naive Bayes classifier, without any feature selection or hyperparameter adjustments. For the large dataset, the overall scores of the naive Bayes on the test set were as follows:

Churn problem: AUC = 0.6468
Appetency problem: AUC = 0.6453
Upselling problem: AUC=0.7211

Note that the baseline results are reported for large dataset only. Moreover, while both training and test datasets are provided at the KDD Cup site, the actual true labels ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Publisher Resources

ISBN: 9781788622219Supplemental Content

Machine Learning: End-to-End guide for Java developers

by Richard M. Reese, Jennifer L. Reese, Boštjan Kaluža, Dr. Uday Kamath, Krishna Choppella

Basic naive Bayes classifier baseline

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

DevOps Tools for Java Developers

Hands-On Deep Learning Algorithms with Python

Big Data Analytics with Java

Go for Java Programmers: Learn the Google Go Programming Language

Publisher Resources