4 Textual similarity

This chapter covers

  • Representing data for authorship analysis with deep learning
  • Applying classifiers to authorship attribution
  • Understanding the merits of MLPs and CNNs for authorship attribution
  • Verifying authorship with Siamese networks

One of the most common applications in natural language processing (NLP) is determining whether two texts are similar. Common applications include

  • Document retrieval—Determining query-result similarity

  • Topic labeling—Assigning a topic to an unlabeled text based on similarity with a set of labeled texts

  • Authorship analysis—Determining whether a text is written by a certain author, based on texts attributed to that author

We will approach the topic of text similarity from the perspective ...

Get Deep Learning for Natural Language Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.