We will start off by using two groups of generated data. One group is for males, who have a 3% probability of not responding to an age question in a survey, and the other group is for females, who have a 5% probability of not responding to an age question:
library(wakefield) library(dplyr) #generate some data for Males with a 5% missing value for age set.seed(10) f.df <- r_data_frame( n = 1000, age, gender(x = c("M","F"), prob = c(0,1),name="Gender"), education ) %>% r_na(col=1,prob=.05) #str(f.df) summary(f.df) set.seed(20) #generate some data for Females with a 3% missing value for age m.df <- r_data_frame( n = 1000, age, gender(x = c("M","F"), prob = c(1,0),name="Gender"), education ...