How to do it...

To clean and explore the data, closely follow the ensuing instructions:

  1. Imported numeric data often contains special characters such as percentage signs, dollar signs, commas, and so on. This causes R to think that the field is a character field instead of a numeric field. For example, our FINVIZ dataset contains numerous values with percentage signs that must be removed. To do this, we will create a clean_numeric function that will strip away any unwanted characters using the gsub command. We will create this function once and then use it multiple times throughout the chapter:
clean_numeric <- function(s){   s <- gsub("%|\\$|,|\\)|\\(", "", s)   s <- as.numeric(s) } 
  1. Next, we will apply this function to the numeric fields ...

Get Practical Data Science Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.