APPENDIXDATA SUMMARIZATION AND VISUALIZATION

Here we present a very brief review of methods for summarizing and visualizing data. For deeper coverage, please see Discovering Statistics by Daniel T. Larose (W.H. Freeman, second edition, 2013).

PART 1: SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS

  • Descriptive statistics refers to methods for summarizing and organizing the information in a data set.

Consider Table A.1, which we will use to illustrate some statistical concepts.

  • The entities for which information is collected are called the elements. In Table A.1, the elements are the 10 applicants. Elements are also called cases or subjects.
  • A variable is a characteristic of an element, which takes on different values for different elements. The variables in Table A.1 are marital status, mortgage, income, rank, year, and risk. Variables are also called attributes.
  • The set of variable values for a particular element is an observation. Observations are also called records. The observation for Applicant 2 is:

     

    Applicant Marital Status Mortgage Income ($) Income Rank Year Risk
    2 Married Yes 32,000 7 2010 Good
  • Variables can be either qualitative or quantitative.
    • A qualitative variable enables the elements to be classified or categorized according to some characteristic. The qualitative variables in Table A.1 are marital status, mortgage, rank, and risk. Qualitative variables are also called categorical variables.
    • A quantitative variable takes numeric values and ...

Get Data Science Using Python and R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.