## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

# Setting up the missing values test dataset

We will start off by using two groups of generated data. One group is for males, who have a 3% probability of not responding to an age question in a survey, and the other group is for females, who have a 5% probability of not responding to an age question:

`library(wakefield) library(dplyr)  #generate some data for Males with a 5% missing value for age  set.seed(10) f.df <- r_data_frame(   n = 1000,   age,   gender(x = c("M","F"), prob = c(0,1),name="Gender"),   education ) %>%   r_na(col=1,prob=.05)    #str(f.df) summary(f.df) set.seed(20) #generate some data for Females with a 3% missing value for age  m.df <- r_data_frame(   n = 1000,   age,   gender(x = c("M","F"), prob = c(1,0),name="Gender"),  education ...`

## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

No credit card required