An example of a document clustering application

This application will read a set of documents and will organize them using the k-means clustering algorithms. To achieve this, we will use four components:

  • The Reader system: This system will read all the documents and convert every document into a list of String objects.
  • The Indexer system: This system will process the documents and convert them into a list of words. At the same time, it will generate the global vocabulary of the set of documents with all the words that appear on them.
  • The Mapper system: This system will convert each list of words into a mathematical representation using the vector space model. The value of each item will be the Tf-Idf (short for term frequency–inverse document frequency ...

Get Mastering Concurrency Programming with Java 8 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.