O'Reilly logo

Programming Collective Intelligence by Toby Segaran

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Appendix B. Mathematical Formulas

Throughout the book I have introduced a number of mathematical concepts. This appendix covers selected concepts and gives a description, relevant formulas, and code for each of them.

Euclidean Distance

Euclidean distance finds the distance between two points in multidimensional space, which is the kind of distance you measure with a ruler. If the points are written as (p1, p2, p3, p4, ...) and (q1, q2, q3, q4, ...), then the formula for Euclidean distance can be expressed as shown in Figure B-1.

Euclidean distance

Figure B-1. Euclidean distance

A clear implementation of this formula is shown here:

def euclidean(p,q):
  sumSq=0.0

  # add up the squared differences
  for i in range(len(p)):
    sumSq+=(p[i]-q[i])**2

  # take the square root
  return (sumSq**0.5)

Euclidean distance is used in several places in this book to determine how similar two items are.

Pearson Correlation Coefficient

The Pearson correlation coefficient is a measure of how highly correlated two variables are. It is a value between 1 and −1, where 1 indicates that the variables are perfectly correlated, 0 indicates no correlation, and −1 means they are perfectly inversely correlated.

Figure B-2 shows the Pearson correlation coefficient.

Pearson correlation coefficient

Figure B-2. Pearson correlation coefficient

This can be implemented with the following code: ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required