O'Reilly logo

Apache Mahout Clustering Designs by Ashish Gupta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 9. Creating a Cluster Model for Production

We have visited the different clustering algorithms offered by Mahout. We have also discussed how to use these algorithms with different datasets. We saw how to evaluate and improve cluster qualities. Now, in this final chapter, we will evaluate how to create a production-ready clustering model.

In this chapter, we will pick up one real-world scenario and discuss the following points:

  • Preparing the dataset
  • Launching the Mahout job on the cluster
  • Performance tuning for the job

Preparing the dataset

The dataset preparation is the most important task of any machine learning related activity. You are not going to get text or structured data in all use cases. Collecting the data in the system where you are ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required