Skip to Main Content
Data Science at the Command Line, 2nd Edition
book

Data Science at the Command Line, 2nd Edition

by Jeroen Janssens
August 2021
Beginner to intermediate content levelBeginner to intermediate
280 pages
6h 12m
English
O'Reilly Media, Inc.
Content preview from Data Science at the Command Line, 2nd Edition

Chapter 10. Polyglot Data Science

A polyglot is someone who speaks multiple languages. A polyglot data scientist, as I see it, is someone who uses multiple programming languages, tools, and techniques to obtain, scrub, explore, and model data.

The command line stimulates a polyglot approach. The command line doesn’t care which programming language a tool is written in, as long as it adheres to the Unix philosophy. We saw that very clearly in Chapter 4, where we created command-line tools in Bash, Python, and R. Moreover, we executed SQL queries directly on CSV files and executed R expressions from the command line. In short, we have already been doing polyglot data science without fully realizing it!

In this chapter I’m going take this further by flipping it around. I’m going to show you how to leverage the command line from various programming languages and environments. Because let’s be honest: we’re not going to spend our entire data science careers at the command line. As for me, when I’m analyzing some data, I often use the RStudio integrated development environment (IDE); and when I’m implementing something, I often use Python. I use whatever helps me get the job done.

I find it comforting to know that the command line is often within arm’s reach, without my having to switch to a different application. It allows me to quickly run a command without switching to a separate application and breaking my workflow. Examples are downloading files with curl, inspecting a piece of ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Data Science Handbook

Python Data Science Handbook

Jake VanderPlas

Publisher Resources

ISBN: 9781492087908Errata PageSupplemental Content