O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data Manipulation with R

Book Description

Written for intermediate to advanced users of R, this tutorial will enhance your data manipulation capabilities considerably. It takes you step-by-step through the tools and techniques needed to enable analysis and visualization.

In Detail

One of the most important aspects of computing with data is the ability to manipulate it to enable subsequent analysis and visualization. R offers a wide range of tools for this purpose. Data from any source, be it flat files or databases, can be loaded into R and this will allow you to manipulate data format into structures that support reproducible and convenient data analysis.

This practical, example-oriented guide aims to discuss the split-apply-combine strategy in data manipulation, which is a faster data manipulation approach. After reading this book, you will not only be able to efficiently manage and check the validity of your datasets with the split-apply-combine strategy, but you will also learn to handle larger datasets.

This book starts with describing the R object's mode and class, and then highlights different R data types, explaining their basic operations. You will focus on group-wise data manipulation with the split-apply-combine strategy, supported by specific examples. You will also learn to efficiently handle date, string, and factor variables along with different layouts of datasets using the reshape2 package. You will learn to use plyr effectively for data manipulation, truncating and rounding data, simulating data sets, as well as character manipulation. Finally you will get acquainted with using R with SQL databases.

What You Will Learn

  • Learn R data types and their basic operations
  • Deal efficiently with string, factor, and date
  • Understand group-wise data manipulation
  • Work with different layouts of the R dataset and interchange between layouts for different purposes
  • Connect R with database software to manage relational databases
  • Manage bigger datasets using R
  • Manipulate datasets using SQL statements through the sqldf package

Table of Contents

  1. Data Manipulation with R
    1. Table of Contents
    2. Data Manipulation with R
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    8. 1. R Data Types and Basic Operations
      1. Modes and classes of R objects
      2. R object structure and mode conversion
      3. Vector
      4. Factor and its types
        1. Data frame
        2. Matrices
        3. Arrays
        4. list
      5. Missing values in R
      6. Summary
    9. 2. Basic Data Manipulation
      1. Acquiring data
      2. Factor manipulation
        1. Factors from numeric variables
      3. Date processing
      4. Character manipulation
      5. Subscripting and subsetting
      6. Summary
    10. 3. Data Manipulation Using plyr
      1. The split-apply-combine strategy
        1. Split-apply-combine without a loop
        2. Split-apply-combine with a loop
      2. Utilities of plyr
        1. Intuitive function names
        2. Input and arguments
      3. Comparing default R and plyr
        1. Multiargument functions
      4. Summary
    11. 4. Reshaping Datasets
      1. The typical layout of a dataset
        1. Long layout
        2. Wide layout
      2. The new layout of a dataset
      3. Reshaping the dataset from the typical layout
      4. Reshaping the dataset with the reshape package
        1. Melting data
          1. Missing values in molten data
        2. Casting molten data
      5. The reshape2 package
      6. Summary
    12. 5. R and Databases
      1. R and different databases
        1. R and Excel
        2. R and MS Access
      2. Relational databases in R
        1. The filehash package
        2. The ff package
      3. R and sqldf
      4. Data manipulation using sqldf
      5. Summary
    13. A. Bibliography
    14. Index