Hands-On Data Science with the Command Line

Book description

Big data processing and analytics at speed and scale using command line tools.

Key Features

  • Perform string processing, numerical computations, and more using CLI tools
  • Understand the essential components of data science development workflow
  • Automate data pipeline scripts and visualization with the command line

Book Description

The Command Line has been in existence on UNIX-based OSes in the form of Bash shell for over 3 decades. However, very little is known to developers as to how command-line tools can be OSEMN (pronounced as awesome and standing for Obtaining, Scrubbing, Exploring, Modeling, and iNterpreting data) for carrying out simple-to-advanced data science tasks at speed.

This book will start with the requisite concepts and installation steps for carrying out data science tasks using the command line. You will learn to create a data pipeline to solve the problem of working with small-to medium-sized files on a single machine. You will understand the power of the command line, learn how to edit files using a text-based and an. You will not only learn how to automate jobs and scripts, but also learn how to visualize data using the command line.

By the end of this book, you will learn how to speed up the process and perform automated tasks using command-line tools.

What you will learn

  • Understand how to set up the command line for data science
  • Use AWK programming language commands to search quickly in large datasets.
  • Work with files and APIs using the command line
  • Share and collect data with CLI tools
  • Perform visualization with commands and functions
  • Uncover machine-level programming practices with a modern approach to data science

Who this book is for

This book is for data scientists and data analysts with little to no knowledge of the command line but has an understanding of data science. Perform everyday data science tasks using the power of command line tools.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Data Science with the Command Line
  3. About Packt
    1. Why subscribe?
    2. Packt.com
  4. Contributors
    1. About the authors
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Conventions used
    4. Get in touch
      1. Reviews
  6. Data Science at the Command Line and Setting It Up
    1. History of the command line
    2. We don't want to BaSH other shells, but...
    3. Language-focused shells
    4. So, why the command line?
    5. Getting set up with Windows 10
    6. Getting set up on OS X
    7. Getting set up on Ubuntu Linux
      1. Getting set up with Docker
    8. Summary
  7. Essential Commands
    1. Essential commands
    2. Navigating the command line
      1. Getting help
      2. Customizing the shell
    3. Summary
  8. Shell Workflows, and Data Acquisition and Massaging
    1. Download the data
    2. Using the file command
    3. Performing a word count
    4. Introduction to cut
    5. Detached processing
      1. How to background a process
      2. Disregarding SIGHUP
      3. Terminal multiplexers
        1. Introduction to screen
      4. Sharing a screen session between multiple users
      5. Introduction to tmux
    6. Summary
  9. Bash Functions and Data Visualization
    1. My first shell script
      1. She bangs, she bangs!
      2. Function arguments, positional parameters, and IFS
        1. Prompt me baby one more time
      3. Feed the function input!
      4. Down the rabbit hole of IFS and bash arrays
    2. Advanced shell scripting magic
      1. Here be dragons, ye be warned
      2. Text injection of text files
        1. Bash networks for fun and profit!
    3. From dumb Terminal to glam Terminal
      1. Who, what, where, why, how?
      2. Enter the mind's eye
    4. Summary
  10. Loops, Functions, and String Processing
    1. Once, twice, three times a lady loops
    2. It's the end of the world as we know it while and until 
    3. The simple case
    4. Pay no heed to the magician redirecting your attention
    5. Regular expressions and grep
      1. Exact matches
      2. Character sets
      3. Dot the i (or anything else)
      4. Capture groups
      5. Either or, neither nor
      6. Repetition
      7. Other operators
      8. Putting it all together
    6. awk, sed, and tr
      1. awk
      2. sed
      3. tr
      4. sort and uniq 
        1. sort
        2. uniq
    7. Summary
  11. SQL, Math, and Wrapping it up
    1. cut and viewing data as columnar
      1. WHERE clauses
      2. Join, for joining data
      3. Group by and ordering
    2. Simulating selects
    3. Keys to the kingdom
      1. Using SQLite
    4. Math in bash itself
      1. Using let
      2. Basic arithmetic
      3. Double-parentheses
      4. bc, the unix basic calculator
      5. Math in (g)awk
    5. Python (pandas, numpy, scikit-learn)
    6. Analyzing weather data in bash
    7. Summary
  12. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Hands-On Data Science with the Command Line
  • Author(s): Jason Morris, Chris McCubbin, Raymond Page
  • Release date: January 2019
  • Publisher(s): Packt Publishing
  • ISBN: 9781789132984