The second example – creating an inverted index for a collection of documents
In the information retrieval world, an inverted index is a common data structure used to speed up the searches of text in a collection of documents. It stores all the words of the document collection and a list of the documents that contains that word.
To construct the index, we have to parse all the documents of the collection and construct the index in an incremental way. For every document, we extract the significant words of that document (deleting the most common words, also called stop words and maybe applying a stemming algorithm) and then add those words to the index. If a word exists in the index, we add the document to the list of documents associated with that ...
Get Mastering Concurrency Programming with Java 8 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.