Summarizing texts by extracting the most representative sentences
In this recipe, we are going to use an extractive method to build a summary out of a set of text documents. By extractive, we mean that rather than drawing any knowledge from the source documents in order to rephrase it in a more concise way, we'll try to detect the most salient sentences in those documents and show these as the summary of the text.
The algorithm we are going to use is somewhat inspired by Google's PageRank and is labeled as LexRank. The spirit behind it is if we try to represent every document sentence as a vector, we shall come up with a graph that represents all of these sentences tied together. Every edge drawn between each couple of sentences is weighted by ...
Get Clojure Data Structures and Algorithms Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.