An example of a document clustering application
This application will read a set of documents and will organize them using the k-means clustering algorithms. To achieve this, we will use four components:
- The Reader system: This system will read all the documents and convert every document into a list of
String
objects. - The Indexer system: This system will process the documents and convert them into a list of words. At the same time, it will generate the global vocabulary of the set of documents with all the words that appear on them.
- The Mapper system: This system will convert each list of words into a mathematical representation using the vector space model. The value of each item will be the Tf-Idf (short for term frequency–inverse document frequency ...
Get Mastering Concurrency Programming with Java 8 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.