O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Summarization using gensim

Gensim has a summarizer that is based on an improved version of the TextRank algorithm by Rada Mihalcea et al. This is a graph-based algorithm that uses keywords in the document as vertices. The weight of the edges between the keywords is determined based on their co-occurrences in the text. An algorithm, similar to PageRank, is used to determine the importance of the keywords. Finally, a summary is extracted by ranking important sentences containing highly ranked keywords. It is clear, from this description, that TextRank is one example of an extractive summarizer. We will look at a simple example, using the gensim summarizer. As test data, we will use the nltk product review corpus:

from nltk.corpus import product_reviews_1 ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required