1Open-Source Tools for Data Science

1.1 R Language and RStudio

In this first section, we introduce the main tools for the R environment: the R language and the RStudio IDE (interactive development environment). The first is an open-source programming language developed by the community, specifically for statistical analysis and data science; the second is an open-source development tool produced by Posit (www.posit.com), formerly called RStudio, representing the standard IDE for R-based data science projects. Posit offers a freeware version of RStudio called RStudio Desktop that fully supports all features for R development; it has been used (v. 2022.07.2) in the preparation of all the R code presented in this book. Commercial versions of RStudio add supporting features typical of managing production software in corporate environments. An alternative to RStudio Desktop is RStudio Cloud, the same IDE offered as a service on a cloud premise. Graphically and functionally, the cloud version is exactly the same as the desktop one; however, its free usage has limitations.

The official distribution of the R language and the RStudio IDE are just the starting points though. This is what distinguishes an open-source technology from a proprietary one. With an open-source technology actively developed by a large online community, as is the case for R, the official distribution provides the basic functionality and, on top of that, layers of additional, advanced, or specialistic features ...

Get Data Science Fundamentals with R, Python, and Open Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.