Chapter 12Machine Learning with R

When you're in a room of data scientists, statisticians, and math types, you'll hear one letter crop up again and again: the letter R. R is a programming language, and it's basically command-line driven. If you used the Spark shell in Chapter 11, then you're already familiar with the shell concept; R is the same. In addition to being used in the command-line shell, R can be written in code form and run.

Why am I telling you all this? Well, on top of the programming skills that get mentioned, you might also be asked, “Do you do R?” After this chapter, you'll hopefully have a starting point to reply, “Yes!”

Installing R

The R language comes ready to use for a number of operating systems. The download page at has a number of mirror sites, so pick a mirror that's closest to you. From the mirror, choose the download for your operating system.


The current version of R (3.1.1 at time of writing) comes in two separate download types: one for users running Snow Leopard and the other for Mavericks. The latter is built on XCode5 compiler binaries. Download the file and open it to install. It installs the R binaries to the /Applications folder.


The .exe download for Windows provides binaries for running on 32- or 64-bit machines. The base package download will provide you with everything you need to get started.


Binary downloads are available for Debian, Ubuntu, Red Hat, and SUSE Linux distributions. ...

Get Machine Learning: Hands-On for Developers and Technical Professionals now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.