3.6 EXERCISES

A set of 10 hypothetical patient records from a large database is presented in Table 3.9. Patients with a diabetes value of 1 have type II diabetes and patients with a diabetes value of 0 do not have type II diabetes. It is anticipated that this data set will be used to predict diabetes based on measurements of age, systolic blood pressure, diastolic blood pressure, and weight.

  1. For the following variables from Table 3.9, assign them to one of the following categories: constant, dichotomous, binary, discrete, and continuous.
    1. Name
    2. Age
    3. Gender
    4. Blood group
    5. Weight (kg)
    6. Height (m)
    7. Systolic blood pressure
    8. Diastolic blood pressure
    9. Temperature
    10. Diabetes
  2. For each of the following variables, assign them to one of the following scales: nominal, ordinal, interval, ratio.
    1. Name
    2. Age
    3. Gender
    4. Blood group
    5. Weight (kg)
    6. Height (m)
    7. Systolic blood pressure
    8. Diastolic blood pressure
    9. Temperature
    10. Diabetes
  3. On the basis of the anticipated use of the data to build a predictive model, identify:
    1. A label for the observations
    2. The descriptor variables
    3. The response variable
  4. Create a new column by normalizing the Weight (kg) variable into the range 0 to 1 using the min-max normalization.
  5. Create a new column by binning the Weight variable into three categories: low (less than 60 kg), medium (60–100 kg), and high (greater than 100 kg).

    Table 3.9. Table of patient records

    images

  6. Create an aggregated column, ...

Get Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.