Now we have enough background information and are ready to proceed to the code.
First, suppose that we have a text file where each line is a document, and we want to index the content of this file and be able to query it. For example, we can take some text from https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt and save it to simple-text.txt.
Then we can read it this way:
Path path = Paths.get("data/simple-text.txt");List<List<String>> documents = Files.lines(path, StandardCharsets.UTF_8) .map(line -> TextUtils.tokenize(line)) .map(line -> TextUtils.removeStopwords(line)) .collect(Collectors.toList());
We use the Files class from the standard library, and then use two functions: ...