Big Data Meets Survey Science

Book description

Offers a clear view of the utility and place for survey data within the broader Big Data ecosystem 

This book presents a collection of snapshots from two sides of the Big Data perspective.  It assembles an array of tangible tools, methods, and approaches that illustrate how Big Data sources and methods are being used in the survey and social sciences to improve official statistics and estimates for human populations.  It also provides examples of how survey data are being used to evaluate and improve the quality of insights derived from Big Data.   

Big Data Meets Survey Science: A Collection of Innovative Methods shows how survey data and Big Data are used together for the benefit of one or more sources of data, with numerous chapters providing consistent illustrations and examples of survey data enriching the evaluation of Big Data sources.  Examples of how machine learning, data mining, and other data science techniques are inserted into virtually every stage of the survey lifecycle are presented. Topics covered include: Total Error Frameworks for Found Data; Performance and Sensitivities of Home Detection on Mobile Phone Data; Assessing Community Wellbeing Using Google Street View and Satellite Imagery; Using Surveys to Build and Assess RBS Religious Flag; and more. 

  • Presents groundbreaking survey methods being utilized today in the field of Big Data 
  • Explores how machine learning methods can be applied to the design, collection, and analysis of social science data 
  • Filled with examples and illustrations that show how survey data benefits Big Data evaluation 
  • Covers methods and applications used in combining Big Data with survey statistics 
  • Examines regulations as well as ethical and privacy issues  

Big Data Meets Survey Science: A Collection of Innovative Methods is an excellent book for both the survey and social science communities as they learn to capitalize on this new revolution. It will also appeal to the broader data and computer science communities looking for new areas of application for emerging methods and data sources. 

Table of contents

  1. Cover
  2. List of Contributors
  3. Introduction
    1. Acknowledgments
    2. References
  4. Section 1: The New Survey Landscape
    1. 1 Why Machines Matter for Survey and Social Science Researchers: Exploring Applications of Machine Learning Methods for Design, Data Collection, and Analysis
      1. 1.1 Introduction
      2. 1.2 Overview of Machine Learning Methods and Their Evaluation
      3. 1.3 Creating Sample Designs and Constructing Sampling Frames Using Machine Learning Methods
      4. 1.4 Questionnaire Design and Evaluation Using Machine Learning Methods
      5. 1.5 Survey Recruitment and Data Collection Using Machine Learning Methods
      6. 1.6 Survey Data Coding and Processing Using Machine Learning Methods
      7. 1.7 Sample Weighting and Survey Adjustments Using Machine Learning Methods
      8. 1.8 Survey Data Analysis and Estimation Using Machine Learning Methods
      9. 1.9 Discussion and Conclusions
      10. References
      11. Further Reading
    2. 2 The Future Is Now: How Surveys Can Harness Social Media to Address Twenty‐first Century Challenges
      1. 2.1 Introduction
      2. 2.2 New Ways of Thinking About Survey Research
      3. 2.3 The Challenge with … Sampling People
      4. 2.4 The Challenge with … Identifying People
      5. 2.5 The Challenge with … Reaching People
      6. 2.6 The Challenge with … Persuading People to Participate
      7. 2.7 The Challenge with … Interviewing People
      8. 2.8 Conclusion
      9. References
    3. 3 Linking Survey Data with Commercial or Administrative Data for Data Quality Assessment
      1. 3.1 Introduction
      2. 3.2 Thinking About Quality Features of Analytic Data Sources
      3. 3.3 Data Used in This Chapter
      4. 3.4 Assessment of Data Quality Using the Linked File
      5. 3.5 Conclusion
      6. References
      7. Further Reading
  5. Section 2: Total Error and Data Quality
    1. 4 Total Error Frameworks for Found Data
      1. 4.1 Introduction
      2. 4.2 Data Integration and Estimation
      3. 4.3 Errors in Datasets
      4. 4.4 Errors in Hybrid Estimates
      5. 4.5 Other Error Frameworks
      6. 4.6 Summary and Conclusions
      7. References
    2. 5 Measuring the Strength of Attitudes in Social Media Data
      1. 5.1 Introduction
      2. 5.2 Methods
      3. 5.3 Results
      4. 5.4 Summary
      5. 5.A 2016 German ESS Questions Used in Analysis
      6. 5.B Search Terms Used to Identify Topics in Reddit Posts (2016 and 2018)
      7. 5.C Example of Coding Steps Used to Identify Topics and Assign Sentiment in Reddit Submissions (2016 and 2018)
      8. References
    3. 6 Attention to Campaign Events: Do Twitter and Self‐Report Metrics Tell the Same Story?
      1. 6.1 What Can Social Media Tell Us About Social Phenomena?
      2. 6.2 The Empirical Evidence to Date
      3. 6.3 Tweets as Public Attention
      4. 6.4 Data Sources
      5. 6.5 Event Detection
      6. 6.6 Did Events Peak at the Same Time Across Data Streams?
      7. 6.7 Were Event Words Equally Prominent Across Data Streams?
      8. 6.8 Were Event Terms Similarly Associated with Particular Candidates?
      9. 6.9 Were Event Trends Similar Across Data Streams?
      10. 6.10 Unpacking Differences Between Samples
      11. 6.11 Conclusion
      12. References
    4. 7 Improving Quality of Administrative Data: A Case Study with FBI's National Incident‐Based Reporting System Data
      1. 7.1 Introduction
      2. 7.2 The NIBRS Database
      3. 7.3 Data Quality Improvement Based on the Total Error Framework
      4. 7.4 Utilizing External Data Sources in Improving Data Quality of the Administrative Data
      5. 7.5 Summary and Future Work
      6. References
    5. 8 Performance and Sensitivities of Home Detection on Mobile Phone Data
      1. 8.1 Introduction
      2. 8.2 Deploying Home Detection Algorithms to a French CDR Dataset
      3. 8.3 Assessing Home Detection Performance at Nationwide Scale
      4. 8.4 Results
      5. 8.5 Discussion and Conclusion
      6. References
  6. Section 3: Big Data in Official Statistics
    1. 9 Big Data Initiatives in Official Statistics
      1. 9.1 Introduction
      2. 9.2 Some Characteristics of the Changing Survey Landscape
      3. 9.3 Current Strategies to Handle the Changing Survey Landscape
      4. 9.4 The Potential of Big Data and the Use of New Methods in Official Statistics
      5. 9.5 Big Data Quality
      6. 9.6 Legal Issues
      7. 9.7 Future Developments
      8. References
    2. 10 Big Data in Official Statistics: A Perspective from Statistics Netherlands
      1. 10.1 Introduction
      2. 10.2 Big Data and Official Statistics
      3. 10.3 Examples of Big Data in Official Statistics
      4. 10.4 Principles for Assessing the Quality of Big Data Statistics
      5. 10.5 Integration of Big Data with Other Statistical Sources
      6. 10.6 Disclosure Control with Big Data
      7. 10.7 The Way Ahead: A Chance for Paradigm Fusion
      8. 10.8 Conclusion
      9. References
      10. Further Reading
    3. 11 Mining the New Oil for Official Statistics1
      1. 11.1 Introduction
      2. 11.2 Statistical Inference for Binary Variables from Nonprobability Samples
      3. 11.3 Integrating Data Source B Subject to Undercoverage Bias
      4. 11.4 Integrating Data Sources Subject to Measurement Errors
      5. 11.5 Integrating Probability Sample A Subject to Unit Nonresponse
      6. 11.6 Empirical Studies
      7. 11.7 Examples of Official Statistics Applications
      8. 11.8 Limitations
      9. 11.9 Conclusion
      10. References
      11. Further Reading
    4. 12 Investigating Alternative Data Sources to Reduce Respondent Burden in United States Census Bureau Retail Economic Data Products
      1. 12.1 Introduction
      2. 12.2 Respondent Burden
      3. 12.3 Point‐of‐Sale Data
      4. 12.4 Project Description
      5. 12.5 Summary
      6. Disclaimer
      7. Disclosure
      8. References
  7. Section 4: Combining Big Data with Survey Statistics: Methods and Applications
    1. 13 Effects of Incentives in Smartphone Data Collection
      1. 13.1 Introduction
      2. 13.2 The Influence of Incentives on Participation
      3. 13.3 Institut für Arbeitsmarkt‐ und Berufsforschung (IAB)‐SMART Study Design
      4. 13.4 Results
      5. 13.5 Summary
      6. References
    2. 14 Using Machine Learning Models to Predict Attrition in a Survey Panel
      1. 14.1 Introduction
      2. 14.2 Methods
      3. 14.3 Results
      4. 14.4 Discussion
      5. 14.A Questions Used in the Analysis
      6. References
    3. 15 Assessing Community Wellbeing Using Google Street‐View and Satellite Imagery
      1. 15.1 Introduction
      2. 15.2 Methods
      3. 15.3 Application Results
      4. 15.4 Conclusions
      5. 15.A Amazon Mechanical Turk Questionnaire
      6. 15.B Pictures and Maps
      7. 15.C Descriptive Statistics
      8. 15.D Stepwise AIC OLS Regression Models
      9. 15.E Generalized Linear Models via Penalized Maximum Likelihood with k-Fold Cross-Validation
      10. 15.F Heat Maps - Actual vs. Model-Based Outcomes
      11. References
    4. 16 Nonparametric Bootstrap and Small Area Estimation to Mitigate Bias in Crowdsourced Data: Simulation Study and Application to Perceived Safety
      1. 16.1 Introduction
      2. 16.2 The Rise of Crowdsourcing and Implications
      3. 16.3 Crowdsourcing Data to Analyze Social Phenomena: Limitations
      4. 16.4 Previous Approaches for Reweighting Crowdsourced Data
      5. 16.5 A New Approach: Small Area Estimation Under a Nonparametric Bootstrap Estimator
      6. 16.6 Simulation Study
      7. 16.7 Case Study: Safety Perceptions in London
      8. 16.8 Discussion and Conclusions
      9. References
    5. 17 Using Big Data to Improve Sample Efficiency
      1. 17.1 Introduction and Background
      2. 17.2 Methods to More Efficiently Sample Unregistered Boat‐Owning Households
      3. 17.3 Results
      4. 17.4 Conclusions
      5. Acknowledgments
      6. References
  8. Section 5: Combining Big Data with Survey Statistics: Tools
    1. 18 Feedback Loop: Using Surveys to Build and Assess Registration‐Based Sample Religious Flags for Survey Research
      1. 18.1 Introduction
      2. 18.2 The Turn to Trees
      3. 18.3 Research Agenda
      4. 18.4 Data
      5. 18.5 Combining the Data
      6. 18.6 Building Models
      7. 18.7 Variables
      8. 18.8 Results
      9. 18.9 Considering Systematic Matching Rates
      10. 18.10 Discussion and Conclusions
      11. References
    2. 19 Artificial Intelligence and Machine Learning Derived Efficiencies for Large‐Scale Survey Estimation Efforts
      1. 19.1 Introduction
      2. 19.2 Background
      3. 19.3 Accelerating the MEPS Imputation Processes: Development of Fast‐Track MEPS Analytic Files
      4. 19.4 Building the Prototype
      5. 19.5 An Artificial Intelligence Approach to Fast‐Track MEPS Imputation
      6. 19.6 Summary
      7. Acknowledgments
      8. References
    3. 20 Worldwide Population Estimates for Small Geographic Areas: Can We Do a Better Job?
      1. 20.1 Introduction
      2. 20.2 Background
      3. 20.3 Gridded Population Estimates
      4. 20.4 Population Estimates in Surveys
      5. 20.5 Case Study
      6. 20.6 Conclusions and Next Steps
      7. Acknowledgments
      8. References
  9. Section 6: The Fourth Paradigm, Regulations, Ethics, Privacy
    1. 21 Reproducibility in the Era of Big Data: Lessons for Developing Robust Data Management and Data Analysis Procedures
      1. 21.1 Introduction
      2. 21.2 Big Data
      3. 21.3 Challenges Researchers Face in the Era of Big Data and Reproducibility
      4. 21.4 Reproducibility
      5. 21.5 Reliability and Validity of Administrative Data
      6. 21.6 Data and Methods
      7. 21.7 Discussion
      8. References
      9. Further Reading
    2. 22 Combining Active and Passive Mobile Data Collection: A Survey of Concerns
      1. 22.1 Introduction
      2. 22.2 Previous Research
      3. 22.3 Methods and Data
      4. 22.4 Results
      5. 22.5 Conclusion
      6. 22.A Appendix
      7. 22.B Appendix
      8. Funding
      9. References
    3. 23 Attitudes Toward Data Linkage: Privacy, Ethics, and the Potential for Harm
      1. 23.1 Introduction: Big Data and the Federal Statistical System in the United States
      2. 23.2 Data and Methods
      3. 23.3 Results
      4. 23.4 Discussion: Toward an Ethical Framework
      5. References
    4. 24 Moving Social Science into the Fourth Paradigm: The Data Life Cycle
      1. 24.1 Consequences and Reality of the Availability of Big Data and Massive Compute Power for Survey Research and Social Science
      2. 24.2 Technical Challenges for Data‐Intensive Social Science Research
      3. 24.3 The Solution: Social Science Researchers Become “Data‐Aware”
      4. 24.4 Data Awareness
      5. 24.5 Bridge the Gap Between Silos
      6. 24.6 Conclusion
      7. References
  10. Index
  11. End User License Agreement

Product information

  • Title: Big Data Meets Survey Science
  • Author(s): Craig A. Hill, Paul P. Biemer, Trent D. Buskirk, Lilli Japec, Antje Kirchner, Stas Kolenikov, Lars E. Lyberg
  • Release date: September 2020
  • Publisher(s): Wiley
  • ISBN: 9781118976326