# Chapter 5

# Quadratic Programming and Machine Learning – Large Scale Problems and Sparsity

## 5.1. Introduction

For a child, learning is a complex process that consists of acquiring or developing certain competencies on the basis of multiple experiences. For a machine, this learning process can be reduced to examples or observations that are used to improve performance. Machine learning can be seen as the optimization of criteria defined on examples. The higher the number of examples, the better the learning process. In terms of optimization, this learning process includes several specific problems. How are the criteria to be optimized defined? How can we manage large amounts of data? Which algorithm is efficient in this context?

When dealing with those problems, neural network approaches suggest the usage of non-convex criteria associated with gradient descent methods. This procedure leads to several difficulties linked to the non-convex criteria. The key to the success of kernel-based methods (obtained about a decade after the introduction of neural networks) is their capacity to express the learning problem as a large scale quadratic programming problem (convex). Kernel-based methods often lead to sparse solutions, i.e. a large number of their components equal zero. Based on this particularity, learning algorithms can solve large scale problems in a reasonable time. Solving this type of problem currently takes about one day of calculating when using a mono-processor for 8 million ...