Chapter 9. Advanced Classification: Kernel Methods and SVMs

Previous chapters have considered several classifiers, including decision trees, Bayesian classifiers, and neural networks. This chapter will introduce the concept of linear classifiers and kernel methods as a prelude to covering one of the most advanced classifiers, and one that remains an active area of research, called support-vector machines (SVMs).

The dataset used throughout much of the chapter pertains to matching people on a dating site. Given information about two people, can we predict whether they will be a good match? This is an interesting problem because there are many variables, both numerical and nominal, and many nonlinear relationships. This dataset will be used to demonstrate some of the weaknesses of the previously described classifiers, and to show how the dataset can be tweaked to work better with these algorithms. An important thing to take away from this chapter is that it’s rarely possible to throw a complex dataset at an algorithm and expect it to learn how to classify things accurately. Choosing the right algorithm and preprocessing the data appropriately is often required to get good results. I hope that going through the process of tweaking this dataset will give you ideas for how to modify others in the future.

At the end of the chapter, you’ll learn how to build a dataset of real people from Facebook, a popular social networking site, and you’ll use the algorithms to predict whether people with ...

Get Programming Collective Intelligence now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.