9.2 EXAMPLE

9.2.1 Problem Overview

To illustrate the process described in this book, we will use an example data set from Newman (1998): The Pima Indian Diabetic Database. This set is extracted from a database generated by The National Institute of Diabetes and Digestive and Kidney Diseases of the NIH. The data set contains observations on 768 female patients between age 21 and 81, and specifies whether they have contracted diabetes in five years. The following describes a hypothetical analysis scenario to illustrate the process of making sense of data.

9.2.2 Problem Definition

Objectives

Diabetes is a major cause of morbidity (for example, blindness or kidney failure) among female Pima Indians of Arizona. It is also one of the leading causes of death. The objective of the analysis is to understand any general relationships between different patient characteristics and the propensity to develop diabetes, specifically:

  • Objective 1: Understand differences in the measurements recorded between the group that develop diabetes and the group that does not develop diabetes.
  • Objective 2: Identify associations between the different factors and the development of diabetes that could be used for education and intervention purposes. Any associations need to make use of general categories, such as high blood pressure, to be useful.
  • Objective 3: Develop a predictive model to estimate whether a patient will develop diabetes.

The success criterion is whether the work results in a decrease in ...

Get Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.