There has always been this concern regarding p-values with small samples, however using data with large samples can also yield bad results, in that the p-value may be correct, but the magnitude of the change (the effect) is very small.
Take this example in which you are measuring the effect of winning between an average of $1,000,000 or $1,000,001 in a lottery.
We will generate two probability samples:
- X contains 1 million observations with a mean value of 1,000,000
- Y also contains 1 million observations but with a mean value of 1,000,001 and it is only a 1-unit difference
Generate a dataframe with X and Y and print some summary statistics:
set.seed(1020) lottery <- data.frame( cbind(x=rnorm(n=1000000,1000000,100),y=rnorm(1000000,1000001,100) ...