O'Reilly logo

Apache Spark 2.x Machine Learning Cookbook by Shuen Mei, Broderick Hall, Meenakshi Rajendran, Siamak Amirghodsi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

How to do it...

  1. The Cleveland Heart Disease database is a published dataset used by ML researchers. The dataset contains more than a dozen fields, and experiments with the Cleveland database have concentrated on simply attempting to distinguish presence (value 1,2,3) and absence (value 0) of the disease (in the goal column, 14th column).

  2. The Cleveland Heart Disease dataset is available at http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data.

  3. The dataset contains the following attributes (age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, slope, ca, thal, num) that are depicted as the header  of the table below:

For a detailed explanation on the individual attributes, refer ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required