We have uploaded a subset of the data, as well as notes and code to help replicate our analyses, at http://data.doloreslabs.com.
If you're interested in learning R, we recommend two websites:
R's official website is http://www.r-project.org. If you are interested in how it compares to other data analysis packages, see the many comments on an early draft of Table 17-2 at http://anyall.org/blog/?p=421.
The most commonly recommended book for learning R is Peter Dalgaard's Introductory Statistics with R (Springer; 2008).
Aside from R's core functionality, some of the add-on packages we used include corrgram, flowCore, gclus, geneplotter, plyr, and pixmap.
Good overviews of clustering, loess, and other machine learning techniques are in The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, and Jerome Friedman (Springer; 2008).
The section on tags barely touches the surface of statistical language analysis. For more, see the chapters on corpus linguistics from Foundations of Statistical Natural Language Processing by Christopher Manning and Hinrich Schütze (MIT Press; 1999) and also Speech and Language Processing by Daniel Jurafsky and James H. Martin (Prentice Hall; 2008).
There are many better ways for estimating confidence intervals for the attractiveness versus age ...