O'Reilly logo

Data Algorithms by Mahmoud Parsian

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 27. Linear Regression

This chapter presents a very important statistical concept, linear regression,1 which has many uses, including clinical applications such as genome analysis using patient sample data. According to Wikipedia: “Linear regression is widely used in biological, behavioral and social sciences to describe possible relationships between variables. It ranks as one of the most important tools used in these disciplines.” Implementing linear regression for small data is very straightforward: we can use many existing Java classes, such as SimpleRegression from Apache Commons.2 However, these classes and packages can not handle a huge amount of data due to the limited memory and CPU resources in a single server. Our primary goal in this chapter is to implement linear regression for huge data sets (such as genomic data represented by biosets for many patients’ sample data).

This chapter provides two distinct MapReduce/Hadoop solutions for linear regression:

  • The first solution utilizes Apache Commons’s SimpleRegression.

  • The second solution implements MapReduce by using R’s linear model.

Spark provides the Machine Learning Library package, or MLlib, which includes linear methods (MLlib is under active development).

The most common form of linear regression is least squares fitting. Before getting into the details of implementing linear regression, let’s define what it is and what it tells us. In simple terms, we are trying to fit an equation to a real set of ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required