O'Reilly logo

Practical Data Science Cookbook - Second Edition by Abhijit Dasgupta, Benjamin Bengfort, Sean Patrick Murphy, Tony Ojeda, Prabhanjan Tattar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

How to do it...

To clean and explore the data, closely follow the ensuing instructions:

  1. Imported numeric data often contains special characters such as percentage signs, dollar signs, commas, and so on. This causes R to think that the field is a character field instead of a numeric field. For example, our FINVIZ dataset contains numerous values with percentage signs that must be removed. To do this, we will create a clean_numeric function that will strip away any unwanted characters using the gsub command. We will create this function once and then use it multiple times throughout the chapter:
clean_numeric <- function(s){   s <- gsub("%|\\$|,|\\)|\\(", "", s)   s <- as.numeric(s) } 
  1. Next, we will apply this function to the numeric fields ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required