Introduction

This text introduces the fundamentals of data science using two main programming languages and open-source technologies : R and Python. These are accompanied by the respective application contexts formed by tools to support coding scripts, i.e. logical sequences of instructions with the aim to produce certain results or functionalities. The tools can be of the command line interface (CLI) type, which are consoles to be used with textual commands, and integrated development environment (IDE), which are of interactive type to support the use of languages. Other elements that make up the application context are the supplementary libraries that contain the additional functions in addition to the basic ones coming with the language, package managers for the automated management of the download and installation of new libraries, online documentation, cheat sheets, tutorials, and online forums of discussion and help for users. This context, formed by a language, tools, additional features, discussions between users, and online documentation produced by developers, is what we mean when we say "R" and "Python," not the simple programming language tool, which by itself would be very little. It is like talking only about the engine when instead you want to explain how to drive a car on busy roads.

R and Python, together and with the meaning just described, represent the knowledge to start approaching data science, carry out the first simple steps, complete the educational ...

Get Data Science Fundamentals with R, Python, and Open Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.