Skip to Content
Data Science at the Command Line
book

Data Science at the Command Line

by Jeroen Janssens
October 2014
Beginner to intermediate
210 pages
4h 32m
English
O'Reilly Media, Inc.
Content preview from Data Science at the Command Line

Chapter 1. Introduction

This book is about doing data science at the command line. Our aim is to make you a more efficient and productive data scientist by teaching you how to leverage the power of the command line.

Having both the terms “data science” and “command line” in the title requires an explanation. How can a technology that’s over 40 years old1 be of any use to a field that’s only a few years young?

Today, data scientists can choose from an overwhelming collection of exciting technologies and programming languages. Python, R, Hadoop, Julia, Pig, Hive, and Spark are but a few examples. You may already have experience in one or more of these. If so, then why should you still care about the command line for doing data science? What does the command line have to offer that these other technologies and programming languages do not?

These are all valid questions. This first chapter will answer these questions as follows. First, we provide a practical definition of data science that will act as the backbone of this book. Second, we’ll list five important advantages of the command line. Third, we demonstrate the power and flexibility of the command line through a real-world use case. By the end of this chapter we hope to have convinced you that the command line is indeed worth learning for doing data science.

Overview

In this chapter, you’ll learn:

  • A practical definition of data science

  • What the command line is exactly and how you can use it

  • Why the command line is a ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Data Science at the Command Line, 2nd Edition

Data Science at the Command Line, 2nd Edition

Jeroen Janssens
Data Analytics with Hadoop

Data Analytics with Hadoop

Benjamin Bengfort, Jenny Kim
Data Science with Java

Data Science with Java

Michael R. Brzustowicz
Python Command Line Tools

Python Command Line Tools

Alfredo Deza, Noah Gift

Publisher Resources

ISBN: 9781491947845Errata Page