book

Programming Collective Intelligence

Name: Programming Collective Intelligence
Author: Toby Segaran
ISBN: 9780596550684

by Toby Segaran

August 2007

Beginner to intermediate

362 pages

10h 11m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Programming Collective Intelligence
A Note Regarding Supplemental Files
Praise for Programming Collective Intelligence
Preface
PrerequisitesStyle of ExamplesWhy Python?Python TipsList and dictionary constructorsSignificant WhitespaceList comprehensionsOpen APIsOverview of the ChaptersConventionsUsing Code ExamplesHow to Contact UsSafari® Books OnlineAcknowledgments
1. Introduction to Collective Intelligence
What Is Collective Intelligence?What Is Machine Learning?Limits of Machine LearningReal-Life ExamplesOther Uses for Learning Algorithms
2. Making Recommendations
Collaborative FilteringCollecting PreferencesFinding Similar UsersEuclidean Distance ScorePearson Correlation ScoreWhich Similarity Metric Should You Use?Ranking the CriticsRecommending ItemsMatching ProductsBuilding a del.icio.us Link RecommenderThe del.icio.us APIBuilding the DatasetRecommending Neighbors and LinksItem-Based FilteringBuilding the Item Comparison DatasetGetting RecommendationsUsing the MovieLens DatasetUser-Based or Item-Based Filtering?Exercises
3. Discovering Groups
Supervised versus Unsupervised LearningWord VectorsPigeonholing the BloggersCounting the Words in a FeedHierarchical ClusteringDrawing the DendrogramColumn ClusteringK-Means ClusteringClusters of PreferencesGetting and Preparing the DataBeautiful SoupScraping the Zebo ResultsDefining a Distance MetricClustering ResultsViewing Data in Two DimensionsOther Things to ClusterExercises
4. Searching and Ranking
What’s in a Search Engine?A Simple CrawlerUsing urllib2Crawler CodeBuilding the IndexSetting Up the SchemaFinding the Words on a PageAdding to the IndexQueryingContent-Based RankingNormalization FunctionWord FrequencyDocument LocationWord DistanceUsing Inbound LinksSimple CountThe PageRank AlgorithmUsing the Link TextLearning from ClicksDesign of a Click-Tracking NetworkSetting Up the DatabaseFeeding ForwardTraining with BackpropagationTraining TestConnecting to the Search EngineExercises
5. Optimization
Group TravelRepresenting SolutionsThe Cost FunctionRandom SearchingHill ClimbingSimulated AnnealingGenetic AlgorithmsReal Flight SearchesThe Kayak APIThe minidom PackageFlight SearchesOptimizing for PreferencesStudent Dorm OptimizationThe Cost FunctionRunning the OptimizationNetwork VisualizationThe Layout ProblemCounting Crossed LinesDrawing the NetworkOther PossibilitiesExercises
6. Document Filtering
Filtering SpamDocuments and WordsTraining the ClassifierCalculating ProbabilitiesStarting with a Reasonable GuessA Naïve ClassifierProbability of a Whole DocumentA Quick Introduction to Bayes’ TheoremChoosing a CategoryThe Fisher MethodCategory Probabilities for FeaturesCombining the ProbabilitiesClassifying ItemsPersisting the Trained ClassifiersUsing SQLiteFiltering Blog FeedsImproving Feature DetectionUsing AkismetAlternative MethodsExercises

7. Modeling with Decision Trees
Predicting SignupsIntroducing Decision TreesTraining the TreeChoosing the Best SplitGini ImpurityEntropyRecursive Tree BuildingDisplaying the TreeGraphical DisplayClassifying New ObservationsPruning the TreeDealing with Missing DataDealing with Numerical OutcomesModeling Home PricesThe Zillow APIModeling “Hotness”When to Use Decision TreesExercises
8. Building Price Models
Building a Sample Datasetk-Nearest NeighborsNumber of NeighborsDefining SimilarityCode for k-Nearest NeighborsWeighted NeighborsInverse FunctionSubtraction FunctionGaussian FunctionWeighted kNNCross-ValidationHeterogeneous VariablesAdding to the DatasetScaling DimensionsOptimizing the ScaleUneven DistributionsEstimating the Probability DensityGraphing the ProbabilitiesUsing Real Data—the eBay APIGetting a Developer KeySetting Up a ConnectionPerforming a SearchGetting Details for an ItemBuilding a Price PredictorWhen to Use k-Nearest NeighborsExercises
9. Advanced Classification: Kernel Methods and SVMs
Matchmaker DatasetDifficulties with the DataDecision Tree ClassifierBasic Linear ClassificationCategorical FeaturesYes/No QuestionsLists of InterestsDetermining Distances Using Yahoo! MapsGetting a Yahoo! Application KeyUsing the Geocoding APICalculating the DistanceCreating the New DatasetScaling the DataUnderstanding Kernel MethodsThe Kernel TrickSupport-Vector MachinesUsing LIBSVMGetting LIBSVMA Sample SessionApplying SVM to the Matchmaker DatasetMatching on FacebookGetting a Developer KeyCreating a SessionDownload Friend DataBuilding a Match DatasetCreating an SVM ModelExercises
10. Finding Independent Features
A Corpus of NewsSelecting SourcesDownloading SourcesConverting to a MatrixPrevious ApproachesBayesian ClassificationClusteringNon-Negative Matrix FactorizationA Quick Introduction to Matrix MathWhat Does This Have to Do with the Articles Matrix?Using NumPyThe AlgorithmDisplaying the ResultsDisplaying by ArticleUsing Stock Market DataWhat Is Trading Volume?Downloading Data from Yahoo! FinancePreparing a MatrixRunning NMFDisplaying the ResultsExercises
11. EVOLVING INTELLIGENCE
What Is Genetic Programming?Genetic Programming Versus Genetic AlgorithmsPrograms As TreesRepresenting Trees in PythonBuilding and Evaluating TreesDisplaying the ProgramCreating the Initial PopulationTesting a SolutionA Simple Mathematical TestMeasuring SuccessMutating ProgramsCrossoverBuilding the EnvironmentThe Importance of DiversityA Simple GameA Round-Robin TournamentPlaying Against Real PeopleFurther PossibilitiesMore Numerical FunctionsMemoryDifferent DatatypesExercises
12. Algorithm Summary
Bayesian ClassifierTrainingClassifyingUsing Your CodeStrengths and WeaknessesDecision Tree ClassifierTrainingUsing Your Decision Tree ClassifierStrengths and WeaknessesNeural NetworksTraining a Neural NetworkUsing Your Neural Network CodeStrengths and WeaknessesSupport-Vector MachinesThe Kernel TrickUsing LIBSVMStrengths and Weaknessesk-Nearest NeighborsScaling and Superfluous VariablesUsing Your kNN CodeStrengths and WeaknessesClusteringHierarchical ClusteringK-Means ClusteringUsing Your Clustering CodeMultidimensional ScalingUsing Your Multidimensional Scaling CodeNon-Negative Matrix FactorizationUsing Your NMF CodeOptimizationThe Cost FunctionSimulated AnnealingGenetic AlgorithmsUsing Your Optimization Code
A. Third-Party Libraries
Universal Feed ParserInstallation for All PlatformsPython Imaging LibraryInstallation on WindowsInstallation on Other PlatformsSimple Usage ExampleBeautiful SoupInstallation on All PlatformsSimple Usage ExamplepysqliteInstallation on WindowsInstallation on Other PlatformsSimple Usage ExampleNumPyInstallation on WindowsInstallation on Other PlatformsSimple Usage ExamplematplotlibInstallationSimple Usage ExamplepydeliciousInstallation for All PlatformsSimple Usage Example
B. Mathematical Formulas
Euclidean DistancePearson Correlation CoefficientWeighted MeanTanimoto CoefficientConditional ProbabilityGini ImpurityEntropyVarianceGaussian FunctionDot-Products
Index
About the Author
Colophon
Copyright

Content preview from Programming Collective Intelligence

Chapter 12. Algorithm Summary

This book has introduced a number of different algorithms, and if you’ve been working through the examples, you now have Python code that implements many of them. The earlier chapters are structured around working through an example problem with algorithms and variations introduced throughout the chapter. This chapter will be a reference for the algorithms covered, so when you want to do some data mining or machine learning on a new dataset, you can look at the algorithms here, decide which one is appropriate, and use the code you’ve already written to analyze your data.

To save you from going back through the book to find the details of an algorithm, I’ll provide a description of each one, a high-level overview of how it works, what sort of datasets you can apply it to, and how you would use the code you’ve previously written to run it. I’ll also mention some of the strengths and weaknesses of each algorithm (or, if you like, how to sell the idea to your boss). In some cases, I’ll use examples to help explain the properties of the algorithm. These examples are greatly simplified—most are so simple you can solve them just by looking at the data yourself—but they are useful for illustration.

Supervised learning methods, which guess a classification or a value based on training examples, will be covered first.

Bayesian Classifier

Bayesian classifiers were covered in Chapter 6. In that chapter, you saw how to create a document classification system, such as ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Grokking Artificial Intelligence Algorithms

Publisher Resources

ISBN: 9780596529321Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Programming Collective Intelligence

by Toby Segaran

Chapter 12. Algorithm Summary

Bayesian Classifier

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.