K-means clustering in Apache Spark

The MRI brain scans that we will use for our k-means clustering model have been downloaded from The Cancer Imaging Archive (TCIA), a service that anonymizes and hosts a large archive of medical images of cancer for public download, and that may be found at http://www.cancerimagingarchive.net/.

The MRI scan of our healthy human brain may be found in the GitHub repository accompanying this book, and is called mri-images-data/mri-healthy-brain.png. The MRI scan of the test human brain is called mri-images-data/mri-test-brain.png. We will use both in the following Spark application when training our k-means clustering model and applying it to image segmentation. Let's begin:

The following subsections describe ...

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.