Python and R for the Modern Data Scientist

Book description

Success in data science depends on the flexible and appropriate use of tools. That includes Python and R, two of the foundational programming languages in the field. This book guides data scientists from the Python and R communities along the path to becoming bilingual. By recognizing the strengths of both languages, you'll discover new ways to accomplish data science tasks and expand your skill set.

Authors Rick Scavetta and Boyan Angelov explain the parallel structures of these languages and highlight where each one excels, whether it's their linguistic features or the powers of their open source ecosystems. You'll learn how to use Python and R together in real-world settings and broaden your job opportunities as a bilingual data scientist.

  • Learn Python and R from the perspective of your current language
  • Understand the strengths and weaknesses of each language
  • Identify use cases where one language is better suited than the other
  • Understand the modern open source ecosystem available for both, including packages, frameworks, and workflows
  • Learn how to integrate R and Python in a single workflow
  • Follow a case study that demonstrates ways to use these languages together

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Why We Wrote This Book
    2. Technical Interactions
    3. Who This Book Is For
    4. Prerequisites
    5. How This Book Is Organized
    6. Let’s Talk
    7. Conventions Used in This Book
    8. Using Code Examples
    9. O’Reilly Online Learning
    10. How to Contact Us
    11. Acknowledgments
  2. I. Discovery of a New Language
  3. 1. In the Beginning
    1. The Origins of R
    2. The Origins of Python
    3. The Language War Begins
    4. The Battle for Data Science Dominance
    5. A Convergence on Cooperation and Community-Building
    6. Final Thoughts
  4. II. Bilingualism I: Learning a New Language
  5. 2. R for Pythonistas
    1. Up and Running with R
    2. Projects and Packages
    3. The Triumph of Tibbles
    4. A Word About Types and Exploring
    5. Naming (Internal) Things
    6. Lists
    7. The Facts About Factors
    8. How to Find…Stuff
    9. Reiterations Redo
    10. Final Thoughts
  6. 3. Python for UseRs
    1. Versions and Builds
    2. Standard Tooling
    3. Virtual Environments
    4. Installing Packages
    5. Notebooks
    6. How Does Python, the Language, Compare to R?
      1. Import a Dataset
      2. Examine the Data
    7. Data Structures and Descriptive Statistics
      1. Data Structures: Back to the Basics
      2. Indexing and Logical Expressions
      3. Plotting
    8. Inferential Statistics
    9. Final Thoughts
  7. III. Bilingualism II: The Modern Context
  8. 4. Data Format Context
    1. External Versus Base Packages
    2. Image Data
    3. Text Data
    4. Time Series Data
      1. Base R
      2. Prophet
    5. Spatial Data
    6. Final Thoughts
  9. 5. Workflow Context
    1. Defining Workflows
    2. Exploratory Data Analysis
      1. Static Visualizations
      2. Interactive Visualizations
    3. Machine Learning
    4. Data Engineering
    5. Reporting
      1. Static Reporting
      2. Interactive Reporting
    6. Final Thoughts
  10. IV. Bilingualism III: Becoming Synergistic
  11. 6. Using the Two Languages Synergistically
    1. Faux Operability
    2. Interoperability
    3. Going Deeper
      1. Pass Objects Between R and Python in an R Markdown Document
      2. Call Python in an R Markdown Document
      3. Call Python by Sourcing a Python Script
      4. Call Python Using the REPL
      5. Call Python with Dynamic Input in an Interactive Document
    4. Final Thoughts
  12. 7. A Case Study in Bilingual Data Science
    1. 24 Years and 1.88 Million Wildfires
    2. Setup and Importing Data
    3. EDA and Data Visualization
    4. Machine Learning
      1. Setting Up Our Python Environment
      2. Feature Engineering
      3. Model Training
    5. Prediction and UI
    6. Final Thoughts
  13. A. A Python:R Bilingual Dictionary
    1. Package Management
    2. Assign Operators
    3. Types
    4. Arithmetic Operators
    5. Attributes
    6. Keywords
    7. Functions and Methods
    8. Style and Naming Conventions
    9. Analogous Data Storage Objects
    10. Data Frames
    11. Logical Expressions
    12. Indexing
  14. Index

Product information

  • Title: Python and R for the Modern Data Scientist
  • Author(s): Rick J. Scavetta, Boyan Angelov
  • Release date: June 2021
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492093350