Chapter 10. Clustering and Customer Segmentation on Big Data

Up until now we have only used and worked on data that was prelabeled that is, supervised. Based on that prelabeled data, we trained our machine learning models and predicted our results. But what if the data is not labeled at all and we just get plain data? In that case, can we carry out any useful analysis of the data at all? Figuring out details from an unlabeled dataset is an example of unsupervised learning, where the machine learning algorithm makes deductions or predictions from raw unlabeled data. One of the most popular approaches to analyzing this unlabeled data is to find groups of similar items within a dataset. This grouping of data has several advantages and use cases, ...

Get Big Data Analytics with Java now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.