Moving from Spreadsheets into R

After spending the previous nine chapters injecting Excel directly into your veins, I'm now going to tell you to drop it. Well, not for everything, but let's be honest, Excel is not ideal for all analytics tasks.

Excel is awesome for learning analytics, because you can touch and see your data in every state as an algorithm changes it from input into output. But you came, you saw, you learned. Do you really need to go through all those steps manually every time? For example, do you really need to bake up your own optimization formulation to fit your own logistic regressions? Do you need to input the definitions of cosine similarity all yourself?

Now that you've learned it, you're allowed to cheat and have someone else do that for you! Think of yourself as Wolfgang Puck. Does he cook everything at all his restaurants? I sure hope not; otherwise, his skills vary wildly from airport to real world. Now that you've learned this stuff, you too should feel comfortable using other folks' implementations of these algorithms.

And that, among many other things (for example, referencing a whole table of data using one word) is why moving from Excel into the analytics-focused programming language called R is worth doing.

This chapter runs some of the previous chapters' analyses in R rather than Excel—same data, same algorithms, different environment. You'll see how easy this stuff can be!

Now, just as a warning, this chapter is not an intro tutorial of R. I'm ...

Get Data Smart: Using Data Science to Transform Information into Insight now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.