Skip to Content
Data Science at the Command Line
video

Data Science at the Command Line

by Jeroen Janssens
August 2014
Intermediate
1h 59m
English
O'Reilly Media, Inc.
Closed Captioning available in German, English, Spanish, French, Japanese, Korean, Portuguese (Portugal, Brazil), Chinese (Simplified), Chinese (Traditional)

Overview

We data scientists love to create exciting data visualizations and insightful statistical models. However, before we get to that point, usually much effort goes into obtaining, scrubbing, and exploring the required data.



The command line, although invented decades ago, is an amazing environment for performing such data science tasks. By combining small, yet powerful, command-line tools you can quickly explore your data and hack together prototypes. New tools such as GNU Parallel, jq, and Drake allow you to use the command line for today's data challenges. Even if you're already comfortable processing data with, for example, R or Python, being able to also leverage the power of the command line can make you a more efficient data scientist.



We will make use of the Data Science Toolbox, which is a free, open-source virtual environment that allows everybody to get started with data science in minutes. The Data Science Toolbox runs not only on Linux, but also on Mac OS X and Microsoft Windows, so everybody can participate with this hands-on webcast.



In about two hours we will cover the following subjects:



  • Essential concepts of the *nix command line;
  • Setting up the Data Science Toolbox;
  • Integrating the command line with IPython and R;
  • Filters such as cut, grep, sed, and awk;
  • Scraping websites using curl, scrape, xml2json, and jq;
  • Managing your data science workflow using Drake;
  • Parallelizing and distributing data-intensive pipelines using GNU Parallel;
  • Turning existing Python, R and Java code into reusable command-line tools;
  • Creating data visualizations and statistical models.


Whether you're entirely new to the command line or already dreaming in shell scripts, by the end of this webcast you will have a solid understanding of how to leverage the power of the command line for your next data science project.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Watch now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Data Science at the Command Line

Data Science at the Command Line

Jeroen Janssens

Publisher Resources

ISBN: 9781491915165