O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Python for R Users

Book Description

The definitive guide for statisticians and data scientists who understand the advantages of becoming proficient in both R and Python

The first book of its kind, Python for R Users: A Data Science Approach makes it easy for R programmers to code in Python and Python users to program in R. Short on theory and long on actionable analytics, it provides readers with a detailed comparative introduction and overview of both languages and features concise tutorials with command-by-command translations—complete with sample code—of R to Python and Python to R.

Following an introduction to both languages, the author cuts to the chase with step-by-step coverage of the full range of pertinent programming features and functions, including data input, data inspection/data quality, data analysis, and data visualization. Statistical modeling, machine learning, and data mining—including supervised and unsupervised data mining methods—are treated in detail, as are time series forecasting, text mining, and natural language processing.

• Features a quick-learning format with concise tutorials and actionable analytics

• Provides command-by-command translations of R to Python and vice versa

• Incorporates Python and R code throughout to make it easier for readers to compare and contrast features in both languages

• Offers numerous comparative examples and applications in both programming languages

• Designed for use for practitioners and students that know one language and want to learn the other

• Supplies slides useful for teaching and learning either software on a companion website

Python for R Users: A Data Science Approach is a valuable working resource for computer scientists and data scientists that know R and would like to learn Python or are familiar with Python and want to learn R. It also functions as textbook for students of computer science and statistics.

A. Ohri is the founder of Decisionstats.com and currently works as a senior data scientist. He has advised multiple startups in analytics off-shoring, analytics services, and analytics education, as well as using social media to enhance buzz for analytics products. Mr. Ohri's research interests include spreading open source analytics, analyzing social media manipulation with mechanism design, simpler interfaces for cloud computing, investigating climate change and knowledge flows. His other books include R for Business Analytics and R for Cloud Computing.

Table of Contents

  1. Cover
  2. Title Page
  3. Preface
  4. Acknowledgments
  5. Scope
  6. Purpose
  7. Plan
  8. The Zen of Python
  9. 1 Introduction to Python R and Data Science
    1. 1.1 What Is Python?
    2. 1.2 What Is R?
    3. 1.3 What Is Data Science?
    4. 1.4 The Future for Data Scientists
    5. 1.5 What Is Big Data?
    6. 1.6 Business Analytics Versus Data Science
    7. 1.7 Tools Available to Data Scientists
    8. 1.8 Packages in Python for Data Science
    9. 1.9 Similarities and Differences between Python and R
    10. 1.10 Tutorials
    11. 1.11 Using R and Python Together
    12. 1.12 Other Software and Python
    13. 1.13 Using SAS with Jupyter
    14. 1.14 How Can You Use Python and R for Big Data Analytics?
    15. 1.15 What Is Cloud Computing?
    16. 1.16 How Can You Use Python and R on the Cloud?
    17. 1.17 Commercial Enterprise and Alternative Versions of Python and R
    18. 1.18 Data‐Driven Decision Making: A Note
    19. Bibliography
  10. 2 Data Input
    1. 2.1 Data Input in Pandas
    2. 2.2 Web Scraping Data Input
    3. 2.3 Data Input from RDBMS
  11. 3 Data Inspection and Data Quality
    1. 3.1 Data Formats
    2. 3.2 Data Quality
    3. 3.3 Data Inspection
    4. 3.4 Data Selection
    5. 3.5 Data Inspection in R
    6. Bibliography
  12. 4 Exploratory Data Analysis
    1. 4.1 Group by Analysis
    2. 4.2 Numerical Data
    3. 4.3 Categorical Data
  13. 5 Statistical Modeling
    1. 5.1 Concepts in Regression
    2. 5.2 Correlation Is Not Causation
    3. 5.3 Linear Regression in R and Python
    4. 5.4 Logistic Regression in R and Python
    5. References
  14. 6 Data Visualization
    1. 6.1 Concepts on Data Visualization
    2. 6.2 Tufte’s Work on Data Visualization
    3. 6.3 Stephen Few on Dashboard Design
    4. 6.4 Basic Plots
    5. 6.5 Advanced Plots
    6. 6.6 Interactive Plots
    7. 6.7 Spatial Analytics
    8. 6.8 Data Visualization in R
    9. Bibliography
  15. 7 Machine Learning Made Easier
    1. 7.1 Deleting Columns We Dont Need in the Final Decision Tree Model
    2. 7.2 Time Series
    3. 7.3 Association Analysis
    4. 7.4 Cleaning Corpus and Making Bag of Words
  16. 8 Conclusion and Summary
  17. Index
  18. End User License Agreement