Book description
This book explains the details involved in solving real computational problems encountered in data analysis. It reveals the dynamic and iterative process by which data analysts approach a problem and reason about different ways of implementing solutions. The book's collection of projects, exercises, and sample solutions encompass practical topics pertaining to data processing and analysis. The book can be used for self-study or as supplementary reading in a statistical computing course, allowing students to gain valuable data science skills.
Table of contents
- Preliminaries
- Series
- Dedication
- Preface
- Acknowledgments
- Authors
- Co-Authors
-
Part I Data Manipulation and Modeling
- Chapter 1 Predicting Location via Indoor Positioning Systems
-
Chapter 2 Modeling Runners' Times in the Cherry Blossom Race
- 2.1 Introduction
- 2.2 Reading Tables of Race Results into R
- 2.3 Data Cleaning and Reformatting Variables
- 2.4 Exploring the Run Time for All Male Runners
- 2.5 Constructing a Record for an Individual Runner across Years
- 2.6 Modeling the Change in Running Time for Individuals
- 2.7 Scraping Race Results from the Web
- 2.8 Exercises
- Bibliography
-
Chapter 3 Using Statistics to Identify Spam
- 3.1 Introduction
- 3.2 Anatomy of an email Message
- 3.3 Reading the email Messages
- 3.4 Text Mining and Naïve Bayes Classification
- 3.5 Finding the Words in a Message
- 3.6 Implementing the Naïve Bayes Classifier
- 3.7 Recursive Partitioning and Classification Trees
- 3.8 Organizing an email Message into an R Data Structure
- 3.9 Deriving Variables from the email Message
- 3.10 Exploring the email Feature Set
- 3.11 Fitting the rpart() Model to the email Data
- 3.12 Exercises
- Bibliography
- Chapter 4 Processing Robot and Sensor Log Files: Seeking a Circular Target
- Chapter 5 Strategies for Analyzing a 12-Gigabyte Data Set: Airline Flight Delays
-
Part II Simulation Studies
- Chapter 6 Pairs Trading
-
Chapter 7 Simulation Study of a Branching Process
- 7.1 Introduction
- 7.2 Exploring the Random Process
- 7.3 Generating Offspring
- 7.4 Profiling and Improving Our Code
- 7.5 From One Job's Offspring to an Entire Generation
- 7.6 Unit Testing
- 7.7 A Structure for the Function's Return Value
- 7.8 The Family Tree: Simulating the Branching Process
- 7.9 Replicating the Simulation
- 7.10 Exercises
- Bibliography
- Chapter 8 A Self-Organizing Dynamic System with a Phase Transition
- Chapter 9 Simulating Blackjack
-
Part III Data and Web Technologies
- Chapter 10 Baseball: Exploring Data in a Relational Database
-
Chapter 11 CIA Factbook Mashup
- 11.1 Introduction
- 11.2 Acquiring the Data
- 11.3 Integrating Data from Different Sources
- 11.4 Preparing the Data for Plotting
- 11.5 Plotting with Google Earth™
- 11.6 Extracting Demographic Information from the CIA XML File
- 11.7 Generating KML Directly
- 11.8 Additional Computational Tasks
- 11.9 Exercises
- Bibliography
-
Chapter 12 Exploring Data Science Jobs with Web Scraping and Text Mining
- 12.1 Introduction and Motivation
- 12.2 Exploring Different Web Sites
- 12.3 Preliminary/Exploratory Scraping: The Kaggle Job List
- 12.4 Scraping CyberCoders.com
- 12.5 A Reusable Generic Framework for Arbitrary Sites
- 12.6 Scraping Career Builder
- 12.7 Scraping Monster.com
- 12.8 Analyzing the Results: The Important Skills
- 12.9 Note on Web Scraping
- 12.10 Exercises
- Bibliography
- Colophon
Product information
- Title: Data Science in R
- Author(s):
- Release date: April 2015
- Publisher(s): Chapman and Hall/CRC
- ISBN: 9781498759878
You might also like
book
Beginning Data Science in R: Data Analysis, Visualization, and Modelling for the Data Scientist
Discover best practices for data analysis and software development in R and start on the path …
book
Practical Data Science with R
NEWER EDITION AVAILABLE IN MEAP Practical Data Science with R, Second Edition is now available in …
book
Hands-On Data Science with R
A hands-on guide for professionals to perform various data science tasks in R Key Features Explore …
book
Cleaning Data for Effective Data Science
Think about your data intelligently and ask the right questions Key Features Master data cleaning techniques …