O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Statistics for Big Data For Dummies

Book Description

The fast and easy way to make sense of statistics for big data

Does the subject of data analysis make you dizzy? You've come to the right place! Statistics For Big Data For Dummies breaks this often-overwhelming subject down into easily digestible parts, offering new and aspiring data analysts the foundation they need to be successful in the field. Inside, you'll find an easy-to-follow introduction to exploratory data analysis, the lowdown on collecting, cleaning, and organizing data, everything you need to know about interpreting data using common software and programming languages, plain-English explanations of how to make sense of data in the real world, and much more.

Data has never been easier to come by, and the tools students and professionals need to enter the world of big data are based on applied statistics. While the word "statistics" alone can evoke feelings of anxiety in even the most confident student or professional, it doesn't have to. Written in the familiar and friendly tone that has defined the For Dummies brand for more than twenty years, Statistics For Big Data For Dummies takes the intimidation out of the subject, offering clear explanations and tons of step-by-step instruction to help you make sense of data mining—without losing your cool.

  • Helps you to identify valid, useful, and understandable patterns in data

  • Provides guidance on extracting previously unknown information from large databases

  • Shows you how to discover patterns available in big data

  • Gives you access to the latest tools and techniques for working in big data

  • If you're a student enrolled in a related Applied Statistics course or a professional looking to expand your skillset, Statistics For Big Data For Dummies gives you access to everything you need to succeed.

    Table of Contents

      1. Cover
      2. Introduction
        1. About This Book
        2. Foolish Assumptions
        3. Icons Used in This Book
        4. Beyond the Book
        5. Where to Go From Here
      3. Part I: Introducing Big Data Statistics
        1. Chapter 1: What Is Big Data and What Do You Do with It?
          1. Characteristics of Big Data
          2. Exploratory Data Analysis (EDA)
          3. Statistical Analysis of Big Data
        2. Chapter 2: Characteristics of Big Data: The Three Vs
          1. Characteristics of Big Data
          2. Traditional Database Management Systems (DBMS)
        3. Chapter 3: Using Big Data: The Hot Applications
          1. Big Data and Weather Forecasting
          2. Big Data and Healthcare Services
          3. Big Data and Insurance
          4. Big Data and Finance
          5. Big Data and Electric Utilities
          6. Big Data and Higher Education
          7. Big Data and Retailers
          8. Big Data and Search Engines
          9. Big Data and Social Media
        4. Chapter 4: Understanding Probabilities
          1. The Core Structure: Probability Spaces
          2. Discrete Probability Distributions
          3. Continuous Probability Distributions
          4. Introducing Multivariate Probability Distributions
        5. Chapter 5: Basic Statistical Ideas
          1. Some Preliminaries Regarding Data
          2. Summary Statistical Measures
          3. Overview of Hypothesis Testing
          4. Higher-Order Measures
      4. Part II: Preparing and Cleaning Data
        1. Chapter 6: Dirty Work: Preparing Your Data for Analysis
          1. Passing the Eye Test: Does Your Data Look Correct?
          2. Being Careful with Dates
          3. Does the Data Make Sense?
          4. Frequently Encountered Data Headaches
          5. Other Common Data Transformations
        2. Chapter 7: Figuring the Format: Important Computer File Formats
          1. Spreadsheet Formats
          2. Database Formats
        3. Chapter 8: Checking Assumptions: Testing for Normality
          1. Goodness of fit test
          2. Jarque-Bera test
        4. Chapter 9: Dealing with Missing or Incomplete Data
          1. Missing Data: What’s the Problem?
          2. Techniques for Dealing with Missing Data
        5. Chapter 10: Sending Out a Posse: Searching for Outliers
          1. Testing for Outliers
          2. Robust Statistics
          3. Dealing with Outliers
      5. Part III: Exploratory Data Analysis (EDA)
        1. Chapter 11: An Overview of Exploratory Data Analysis (EDA)
          1. Graphical EDA Techniques
          2. EDA Techniques for Testing Assumptions
          3. Quantitative EDA Techniques
        2. Chapter 12: A Plot to Get Graphical: Graphical Techniques
          1. Stem-and-Leaf Plots
          2. Scatter Plots
          3. Box Plots
          4. Histograms
          5. Quantile-Quantile (QQ) Plots
          6. Autocorrelation Plots
        3. Chapter 13: You’re the Only Variable for Me: Univariate Statistical Techniques
          1. Counting Events Over a Time Interval: The Poisson Distribution
          2. Continuous Probability Distributions
        4. Chapter 14: To All the Variables We’ve Encountered: Multivariate Statistical Techniques
          1. Testing Hypotheses about Two Population Means
          2. Using Analysis of Variance (ANOVA) to Test Hypotheses about Population Means
          3. The F-Distribution
          4. F-Test for the Equality of Two Population Variances
          5. Correlation
        5. Chapter 15: Regression Analysis
          1. The Fundamental Assumption: Variables Have a Linear Relationship
          2. Defining the Population Regression Equation
          3. Estimating the Population Regression Equation
          4. Testing the Estimated Regression Equation
          5. Using Statistical Software
          6. Assumptions of Simple Linear Regression
          7. Multiple Regression Analysis
          8. Multicollinearity
        6. Chapter 16: When You’ve Got the Time: Time Series Analysis
          1. Key Properties of a Time Series
          2. Forecasting with Decomposition Methods
          3. Smoothing Techniques
          4. Seasonal Components
          5. Modeling a Time Series with Regression Analysis
          6. Comparing Different Models: MAD and MSE
      6. Part IV: Big Data Applications
        1. Chapter 17: Using Your Crystal Ball: Forecasting with Big Data
          1. ARIMA Modeling
          2. Simulation Techniques
        2. Chapter 18: Crunching Numbers: Performing Statistical Analysis on Your Computer
          1. Excelling at Excel
          2. Programming with Visual Basic for Applications (VBA)
          3. R, Matey!
        3. Chapter 19: Seeking Free Sources of Financial Data
          1. Yahoo! Finance
          2. Federal Reserve Economic Data (FRED)
          3. Board of Governors of the Federal Reserve System
          4. U.S. Department of the Treasury
          5. Other Useful Financial Websites
      7. Part V: The Part of Tens
        1. Chapter 20: Ten (or So) Best Practices in Data Preparation
          1. Check Data Formats
          2. Verify Data Types
          3. Graph Your Data
          4. Verify Data Accuracy
          5. Identify Outliers
          6. Deal with Missing Values
          7. Check Your Assumptions about How the Data Is Distributed
          8. Back Up and Document Everything You Do
        2. Chapter 21: Ten (or So) Questions Answered by Exploratory Data Analysis (EDA)
          1. What Are the Key Properties of a Dataset?
          2. What’s the Center of the Data?
          3. How Much Spread Is There in the Data?
          4. Is the Data Skewed?
          5. What Distribution Does the Data Follow?
          6. Are the Elements in the Dataset Uncorrelated?
          7. Does the Center of the Dataset Change Over Time?
          8. Does the Spread of the Dataset Change Over Time?
          9. Are There Outliers in the Data?
          10. Does the Data Conform to Our Assumptions?
      8. About the Authors
      9. Cheat Sheet
      10. Advertisement Page
      11. Connect with Dummies
      12. End User License Agreement