O'Reilly logo

Practical Data Science Cookbook - Second Edition by Abhijit Dasgupta, Benjamin Bengfort, Sean Patrick Murphy, Tony Ojeda, Prabhanjan Tattar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

How to do it...

A step-by-step approach to perform the analysis related to the income_dist.csv file can be easily carried out as shown in the next program.

  1. Load the dataset income_dist.csv using the read.csv function and use the functions nrow, str, length, unique, and so on to get the following results:
id <- read.csv("income_dist.csv",header=TRUE) nrow(id) str(names(id)) length(names(id))  ncol(id) # equivalent of previous line unique(id$Country) levels(id$Country) # alternatively min(id$Year) max(id$Year) id_us <- id[id$Country=="United States",] 

The data is first stored in the R object ID. We see that there are 2180 observations/rows in the dataset. The dataset has 354 variables and a few are seen with the use of two functions, str ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required