O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Programming Skills for Data Science: Start Writing Code to Wrangle, Analyze, and Visualize Data with R, First Edition

Book Description

The Foundational Hands-On Skills You Need to Dive into Data Science

 

Using data science techniques, you can transform raw data into actionable insights for domains ranging from urban planning to precision medicine. Programming Skills for Data Science brings together all the foundational skills you need to get started, even if you have no programming or data science experience.

 

Leading instructors Michael Freeman and Joel Ross guide you through installing and configuring the tools you need to solve professional-level data science problems, including the widely used R language and Git version-control system. They explain how to wrangle your data into a form where it can be easily used, analyzed, and visualized so others can see the patterns you've uncovered. Step by step, you'll master powerful R programming techniques and troubleshooting skills for probing data in new ways, and at larger scales.

 

Freeman and Ross teach through practical examples and exercises that can be combined into complete data science projects. Everything's focused on real-world application, so you can quickly start analyzing your own data and getting answers you can act upon. Learn to

  • Install your complete data science environment, including R and RStudio
  • Manage projects efficiently, from version tracking to documentation
  • Host, manage, and collaborate on data science projects with GitHub
  • Master R language fundamentals: syntax, programming concepts, and data structures
  • Load, format, explore, and restructure data for successful analysis
  • Interact with databases and web APIs
  • Master key principles for visualizing data accurately and intuitively
  • Produce engaging, interactive visualizations with ggplot and other R packages
  • Transform analyses into sharable documents and sites with R Markdown
  • Create interactive web data science applications with Shiny
  • Collaborate smoothly as part of a data science team

Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.

Table of Contents

  1. Cover Page
  2. Title Page
  3. Contents
  4. Foreword
  5. Preface
  6. About the Authors
  7. I: Getting Started
    1. 1 Setting Up Your Computer
      1. 1.1 Setting up Command Line Tools
      2. 1.2 Installing git
      3. 1.3 Creating a GitHub Account
      4. 1.4 Selecting a Text Editor
      5. 1.5 Downloading the R Language
      6. 1.6 Downloading RStudio
    2. 2 Using the Command Line
      1. 2.1 Accessing the Command Line
      2. 2.2 Navigating the File System
      3. 2.3 Managing Files
      4. 2.4 Dealing with Errors
      5. 2.5 Directing Output
      6. 2.6 Networking Commands
  8. II: Managing Projects
    1. 3 Version Control with git and GitHub
      1. 3.1 What Is git?
      2. 3.2 Configuration and Project Setup
      3. 3.3 Tracking Project Changes
      4. 3.4 Storing Projects on GitHub
      5. 3.5 Accessing Project History
      6. 3.6 Ignoring Files from a Project
    2. 4 Using Markdown for Documentation
      1. 4.1 Writing Markdown
      2. 4.2 Rendering Markdown
  9. III: Foundational R Skills
    1. 5 Introduction to R
      1. 5.1 Programming with R
      2. 5.2 Running R Code
      3. 5.3 Including Comments
      4. 5.4 Defining Variables
      5. 5.5 Getting Help
    2. 6 Functions
      1. 6.1 What Is a Function?
      2. 6.2 Built-in R Functions
      3. 6.3 Loading Functions
      4. 6.4 Writing Functions
      5. 6.5 Using Conditional Statements
    3. 7 Vectors
      1. 7.1 What Is a Vector?
      2. 7.2 Vectorized Operations
      3. 7.3 Vector Indices
      4. 7.4 Vector Filtering
      5. 7.5 Modifying Vectors
    4. 8 Lists
      1. 8.1 What Is a List?
      2. 8.2 Creating Lists
      3. 8.3 Accessing List Elements
      4. 8.4 Modifying Lists
      5. 8.5 Applying Functions to Lists with lapply()
  10. IV: Data Wrangling
    1. 9 Understanding Data
      1. 9.1 The Data Generation Process
      2. 9.2 Finding Data
      3. 9.3 Types of Data
      4. 9.4 Interpreting Data
      5. 9.5 Using Data to Answer Questions
    2. 10 Data Frames
      1. 10.1 What Is a Data Frame?
      2. 10.2 Working with Data Frames
      3. 10.3 Working with CSV Data
    3. 11 Manipulating Data with dplyr
      1. 11.1 A Grammar of Data Manipulation
      2. 11.2 Core dplyr Functions
      3. 11.3 Performing Sequential Operations
      4. 11.4 Analyzing Data Frames by Group
      5. 11.5 Joining Data Frames Together
      6. 11.6 dplyr in Action: Analyzing Flight Data
    4. 12 Reshaping Data with tidyr
      1. 12.1 What Is “Tidy” Data?
      2. 12.2 From Columns to Rows: gather()
      3. 12.3 From Rows to Columns: spread()
      4. 12.4 tidyr in Action: Exploring Educational Statistics
    5. 13 Accessing Databases
      1. 13.1 An Overview of Relational Databases
      2. 13.2 A Taste of SQL
      3. 13.3 Accessing a Database from R
    6. 14 Accessing Web APIs
      1. 14.1 What Is a Web API?
      2. 14.2 RESTful Requests
      3. 14.3 Accessing Web APIs from R
      4. 14.4 Processing JSON Data
      5. 14.5 APIs in Action: Finding Cuban Food in Seattle
  11. V: Data Visualization
    1. 15 Designing Data Visualizations
      1. 15.1 The Purpose of Visualization
      2. 15.2 Selecting Visual Layouts
      3. 15.3 Choosing Effective Graphical Encodings
      4. 15.4 Expressive Data Displays
      5. 15.5 Enhancing Aesthetics
    2. 16 Creating Visualizations with ggplot2
      1. 16.1 A Grammar of Graphics
      2. 16.2 Basic Plotting with ggplot2
      3. 16.3 Complex Layouts and Customization
      4. 16.4 Building Maps
      5. 16.5 ggplot2 in Action: Mapping Evictions in San Francisco
    3. 17 Interactive Visualization in R
      1. 17.1 The plotly Package
      2. 17.2 The rbokeh Package
      3. 17.3 The leaflet Package
      4. 17.4 Interactive Visualization in Action: Exploring Changes to the City of Seattle
  12. VI: Building and Sharing Applications
    1. 18 Dynamic Reports with R Markdown
      1. 18.1 Setting up a Report
      2. 18.2 Integrating Markdown and R Code
      3. 18.3 Rendering Data and Visualizations in Reports
      4. 18.4 Sharing Reports as Websites
      5. 18.5 R Markdown in Action: Reporting on Life Expectancy
    2. 19 Building Interactive Web Applications with Shiny
      1. 19.1 The Shiny Framework
      2. 19.2 Designing User Interfaces
      3. 19.3 Developing Application Servers
      4. 19.4 Publishing Shiny Apps
      5. 19.5 Shiny in Action: Visualizing Fatal Police Shootings
    3. 20 Working Collaboratively
      1. 20.1 Tracking Different Versions of Code with Branches
      2. 20.2 Developing Projects Using Feature Branches
      3. 20.3 Collaboration Using the Centralized Workflow
      4. 20.4 Collaboration Using the Forking Workflow
    4. 21 Moving Forward
      1. 21.1 Statistical Learning
      2. 21.2 Other Programming Languages
      3. 21.3 Ethical Responsibilities