Chapter 2. Linear Algebra

Now that we have spent a whole chapter acquiring data in some format or another, we will most likely end up viewing the data (in our minds) in the form of spreadsheet. It is natural to envision the names of each column going across from left to right (age, address, ID number, etc.), with each row representing a unique record or data point. Much of data science comes down to this exact formulation. What we are seeking to find is a relationship between any number of columns of interest (which we will call variables) and any number of columns that indicate a measurable outcome (which we will call responses).

Typically, we use the letter to denote the variables, and y for the responses. Likewise, the responses can be designated by a matrix Y that has a number of columns and must have the same number of rows m as X does. Note that in many cases, there is only one dimension of response variable such that p equals 1. However, it helps to generalize linear algebra problems to arbitrary ...

Get Data Science with Java now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.