Exploring the Data
There are many great tools for data analysis. Some of the most commonly used are compared in Table 17-2.
Table 17-2. Comparison of data analysis packages
Library support; visualization
Steep learning curve
Elegant matrix support; visualization
Expensive; incomplete statistics support
Python: flexible and general-purpose programming language
Components poorly integrated
Easy; visual; flexible
Large data sets; weak numeric and programming support
Very large data sets
Very baroque; hardest to learn
Easy statistical analysis
Science (bio and social)
We like to use R, which is an open source statistical and visualization programming environment with a vibrant and growing development community. It's emerged as a de facto standard among statisticians. For exploratory data analysis, we prefer it to the other options because of its graphing libraries, convenient indexing notation, and an amazing array of statistically sophisticated, community-maintained packages. You can read about it and download it at http://www.r-project.org; also look at the references at the end of this chapter.
R provides many excellent tools for looking at what's in the data. >From its interactive interpreter:
Load the data > data = read.delim("http://data.doloreslabs.com/face_scores.tsv", sep="\t") and plot. > plot(data)
Given a basic table of ...