Once a kernel function has been selected, it's possible to build a complete approximation of a probability density function using a k-nearest neighbors approach. In fact, given a dataset X (for simplicity, X ∈ ℜm, so the values are real numbers), it's easy to create, for example, a ball-tree (as discussed in Chapter 2, Clustering Fundamentals) to partition the data in an efficient way. When the data structure is ready, it's possible to obtain all the neighbors of query point xj inside a radius defined by the bandwidth. Let's assume that such a set is Xj = {x1, ..., xt} and the number of points is Nj. The estimation of the probability density is obtained as follows:
It's not difficult to prove that, if the bandwidth ...