WE HAVE INCLUDED OUR FULL regression model in Appendix B. For this year’s report, we have made two important changes to the basic, parsimonious linear model we presented in the 2015 report. We have included: 1) external geographic data (GDP by US state and country), and 2) a square root transformation. The transformation adds one step to the linear model: we add up model coefficients, and then square the result. Both of these changes significantly improve the accuracy in salary estimates.
Our model explains about three-quarters of the variance in the sample salaries (with an R2 of 0.747). Roughly half of the salary variance is due to geography and experience. Given the important factors that can not be captured in the survey— for example, we don’t measure competence or evaluate the quality of respondents’ work output—it’s not surprising that a large amount of variance is left unexplained.
Geography has a huge impact on salary, but is not adequately captured due to sample size. For example, if a country is represented by only one or two respondents, this isn’t enough to justify giving the country its own coefficient. For this reason, we use broad regional coefficients (e.g., “Asia” or “Eastern Europe”), keeping in mind however that economic differences within a region are huge, and thus the accuracy of the model suffers.
To get around this problem, we’ve used publicly available records of per capita GDP of ...