Chapter 2. Linear Algebra

Now that we have spent a whole chapter acquiring data in some format or another, we will most likely end up viewing the data (in our minds) in the form of spreadsheet. It is natural to envision the names of each column going across from left to right (age, address, ID number, etc.), with each row representing a unique record or data point. Much of data science comes down to this exact formulation. What we are seeking to find is a relationship between any number of columns of interest (which we will call variables) and any number of columns that indicate a measurable outcome (which we will call responses).

Typically, we use the letter to denote the variables, and y for the responses. Likewise, the responses can be designated by a matrix Y that has a number of columns and must have the same number of rows m as X does. Note that in many cases, there is only one dimension of response variable such that p equals 1. However, it helps to generalize linear algebra problems to arbitrary ...

Get Data Science with Java now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.